ä¸å®šæœŸML&NLPå ±#1        

先日、社内で定期的に行なわれているフロントエンド会にお邪魔してきました(podcastが配信されています)。jser.infoというサイトを参照しながら雑談していたのですが、最近のフロントエンドの動向を知るという目的にはこのサイトなかなかよさそうでした。

機械学習勉強会でもランチタイムに最近の話題を見ながら雑談しているのですが、ネタになるエントリ一覧とそれに対するコメントは社外に公開して別に問題ないなと思ったので、不定期報という形で出してみることにしました。自然言語処理も自分がカバーできる範囲限られているし、自然言語処理以外の機械学習の話はかなりカバーできないので、たれこみフォームも作りました。耳寄りな情報、お待ちしております:)

論文

ブログ/勉強会資料

ビジネス

学会/勉強会

Coling2016

今年は大阪で開催。

NIPS2016

NL研(第229回自然言語処理研究会)

(2) [NLC] ゲーミフィケーションを利用した効率的な対話ログ収集の試み
○叶内 晨・小町 守(首都大東京)

データが集まると学習する方法はいくらでも出てきているので、データをどうやって効率よく集めていくのかというところに興味がある。

(5) [NL] 雑談対話システムの話題遷移における自然性の自動評価
○豊嶋章宏(NAIST)・杉山弘晃(NTT)・吉野幸一郎・中村 哲(NAIST)

(20) [NL] 14:30 – 15:00
単語分散表現を用いた単語アライメントによる日英機械翻訳の自動評価尺度
○松尾潤樹・小町 守(首都大)・須藤克仁(NTT)

データ収集と合わせて解析系でないタスクの評価は今後ホットなタスクになると思ってます。

(15) [NL] 17:25 – 17:55
単語分かち書き用辞書生成システム NEologd の運用 — 文書分類を例にして —
○佐藤敏紀・橋本泰一(LINE)・奥村 学(東工大)

最近各地で使われることが多くなってきたNEologdの話もあった。

言語処理学会2017

チュートリアルとテーマセッション、ワークショップの内容も出ていました。

クラウドソーシング
馬場 雪乃 先生(京都大学)
ニューラル機械翻訳
中澤 敏明 先生 (JST)
Universal Dependencies
金山 博 先生(日本IBM東京基礎研究所)
田中 貴秋 先生(NTTコミュニケーション科学基礎研究所)
認知言語学
西村 義樹 先生(東京大学)

ニューラル機械翻訳とUniversal Dependenciesが特に気になっている。

IM飲み2016

その他


          Deep Learning: A Practitioner’s Approach        

eBook Details: Paperback: 536 pages Publisher: WOW! eBook; 1st edition (August 20, 2017) Language: English ISBN-10: 1491914254 ISBN-13: 978-1491914250 eBook Description: Deep Learning: A Practitioner’s Approach

The post Deep Learning: A Practitioner’s Approach appeared first on WOW! eBook: Free eBooks Download.


          Musing: Movidius Neural Compute Stick: Deep Learning and AI on a $79 USB Stick        

Could we use AI in network devices ?

The post Musing: Movidius Neural Compute Stick: Deep Learning and AI on a $79 USB Stick appeared first on EtherealMind.


          åˆ©ç”¨ä¸­æ–‡æ•°æ®è·‘Google开源项目word2vec        
http://www.cnblogs.com/hebin/p/3507609.html

一直听说word2vec在处理词与词的相似度的问题上效果十分好,最近自己也上手跑了跑Google开源的代码(https://code.google.com/p/word2vec/)。

1、语料

首先准备数据:采用网上博客上推荐的全网新闻数据(SogouCA),大小为2.1G。 

从ftp上下载数据包SogouCA.tar.gz:
1 wget ftp://ftp.labs.sogou.com/Data/SogouCA/SogouCA.tar.gz --ftp-user=hebin_hit@foxmail.com --ftp-password=4FqLSYdNcrDXvNDi -r

解压数据包:

1 gzip -d SogouCA.tar.gz 2 tar -xvf SogouCA.tar

再将生成的txt文件归并到SogouCA.txt中,取出其中包含content的行并转码,得到语料corpus.txt,大小为2.7G。

1 cat *.txt > SogouCA.txt 2 cat SogouCA.txt | iconv -f gbk -t utf-8 -c | grep "<content>" > corpus.txt

2、分词

用ANSJ对corpus.txt进行分词,得到分词结果resultbig.txt,大小为3.1G。

分词工具ANSJ参见 http://blog.csdn.net/zhaoxinfan/article/details/10403917
在分词工具seg_tool目录下先编译再执行得到分词结果resultbig.txt,内含426221个词,次数总计572308385个。
 åˆ†è¯ç»“果:
  
3、用word2vec工具训练词向量
1 nohup ./word2vec -train resultbig.txt -output vectors.bin -cbow 0 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1 &

vectors.bin是word2vec处理resultbig.txt后生成的词的向量文件,在实验室的服务器上训练了1个半小时。

4、分析
4.1 计算相似的词:
1 ./distance vectors.bin

 ./distance可以看成计算词与词之间的距离,把词看成向量空间上的一个点,distance看成向量空间上点与点的距离。

下面是一些例子: 

4.2 潜在的语言学规律

在对demo-analogy.sh修改后得到下面几个例子:
法国的首都是巴黎,英国的首都是伦敦, vector("法国") - vector("巴黎) + vector("英国") --> vector("伦敦")"

4.3 聚类

将经过分词后的语料resultbig.txt中的词聚类并按照类别排序:

1 nohup ./word2vec -train resultbig.txt -output classes.txt -cbow 0 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -classes 500  & 2 sort classes.txt -k 2 -n > classes_sorted_sogouca.txt  

例如:

4.4 短语分析

先利用经过分词的语料resultbig.txt中得出包含词和短语的文件sogouca_phrase.txt,再训练该文件中词与短语的向量表示。

1 ./word2phrase -train resultbig.txt -output sogouca_phrase.txt -threshold 500 -debug 2 2 ./word2vec -train sogouca_phrase.txt -output vectors_sogouca_phrase.bin -cbow 0 -size 300 -window 10 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1

下面是几个计算相似度的例子:

5、参考链接:

1. word2vec:Tool for computing continuous distributed representations of words,https://code.google.com/p/word2vec/

2. 用中文把玩Google开源的Deep-Learning项目word2vec,http://www.cnblogs.com/wowarsenal/p/3293586.html

3. 利用word2vec对关键词进行聚类,http://blog.csdn.net/zhaoxinfan/article/details/11069485

6、后续准备仔细阅读的文献:

[1] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
[2] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.
[3] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.

[4] Collobert R, Weston J, Bottou L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12: 2493-2537.

 



SIMONE 2016-01-13 13:49 发表评论

          Spring 2017 tech reading        
Hello and a belated happy new year to you! Here's another big list of articles I thought was worth sharing. As always thanks to the authors who wrote these articles and to the people who shared them on Twitter/HackerNews/etc.

Distributed systems (and even plain systems)

Tuning

SQL lateral view

Docker and containers

Science and math

Golang

Java streams and reactive systems

Java Lambdas

Just Java

General and/or fun

Until next time!

          Millimeter-Scale Computers: Now With Deep-Learning Neural Networks on Board        
none
          Stuff The Internet Says On Scalability For July 28th, 2017s        

Hey, it's HighScalability time:

 

Jackson Pollock painting? Cortical column? Nope, it's a 2 trillion particle cosmological simulation using 4000+ GPUs. (paper, Joachim Stadel, UZH)

If you like this sort of Stuff then please support me on Patreon.

 

  • 1.8x: faster code on iPad MacBook Pro; 1 billion: WhatsApp daily active users; 100 milliamps: heart stopping current; $25m: surprisingly low take from ransomware; 2,700x: improvement in throughput with TCP BBR; 620: Uber locations; $35.5 billion: Facebook's cash hoard; 2 billion: Facebook monthly active users; #1: Apple is the world's most profitable [legal] company; 500,000x: return on destroying an arms depot with a drone; 

  • Quotable Quotes:
    • Alasdair Allan: Jeff Bezos’ statement that “there’s not that much interesting about CubeSats” may well turn out to be the twenty first century’s “nobody needs more than 640kb.”
    • @hardmaru: Decoding the Enigma with RNNs. They trained a LSTM with 3000 hidden units to decode ciphertext with 96%+ accuracy. 
    • @tj_waldorf: Morningstar achieved 97% cost reduction by moving to AWS. #AWSSummit Chicago
    • Ed Sperling: Moore’s Law is alive and well, but it is no longer the only approach. And depending on the market or slice of a market, it may no longer be the best approach.
    • @asymco: With the end of Shuffle and Nano iPods Apple now sells only Unix-enabled products. Amazing how far that Bell Labs invention has come.
    • @peteskomoroch: 2017: RAM is the new Hadoop
    • Carlo Pescio: What if focusing on the problem domain, while still understanding the machine that will execute your code, could improve maintainability and collaterally speed up execution by a factor of over 100x compared to popular hipster code?
    • @stevesi: Something ppl forget: moving products to cloud, margins go down due to costs to operate scale services—costs move from Customer to vendor.
    • @brianalvey: The most popular software for writing fiction isn't Word. It's Excel.
    • @pczarkowski: How to make a monolithic app cloud native: 1) run it in a docker 2) change the url from .com to .io
    • @tj_waldorf: Morningstar achieved 97% cost reduction by moving to AWS. #AWSSummit Chicago
    • drinkzima: There is a huge general misunderstanding in the profitability of directing hotel bookings vs flight bookings or other types of travel consumables. Rate parity and high commission rates mean that directing hotel rooms is hugely profitable and Expedia (hotels.com, trivago, expedia) and Priceline (booking.com) operate as a duopoly in most markets. They are both marketing machines that turn brand + paid traffic into highly profitable room nights.
    • Animats: This is a classic problem with AI researchers. Somebody gets a good result, and then they start thinking strong human-level AI is right around the corner. AI went through this with search, planning, the General Problem Solver, perceptrons, the first generation of neural networks, and expert systems. Then came the "AI winter", late 1980s to early 2000s, when almost all the AI startups went bust. We're seeing some of it again in the machine learning / deep neural net era.
    • Charity Majors: So no, ops isn't going anywhere. It just doesn't look like it used to. Soon it might even look like a software engineer.
    • @mthenw: As long as I need to pay for idle it’s not “serverless”. Pricing is different because in Lambda you pay for invocation not for the runtime.
    • Kelly Shortridge: The goal is to make the attacker uncertain of your defensive environment and profile. So you really want to mess with their ability to profile where their target is
    • @CompSciFact: 'About 1,000 instructions is a reasonable upper limit for the complexity of problems now envisioned.' -- John von Neumann, 1946
    • hn_throwaway_99: Few barriers to entry, really?? Sorry, but this sounds a bit like an inexperienced developer saying "Hey, I could build most of Facebook's functionality in 2 weeks." Booking.com is THE largest spender of advertising on Google. They have giant teams that A/B test the living shite out of every pixel on their screens, and huge teams of data scientists squeezing out every last bit of optimization on their site. It's a huge barrier to entry. 
    • callahad: It's real [performance improvements]. We've [Firefox] landed enormous performance improvements this year, including migrating most Firefox users to a full multi-process architecture, as well as integrating parts of the Servo parallel browser engine project into Firefox. There are still many improvements yet-to-land, but in most cases we're on track for Firefox 57 in November.
    • Samer Buna: One important threat that GraphQL makes easier is resource exhaustion attacks (AKA Denial of Service attacks). A GraphQL server can be attacked with overly complex queries that will consume all the resources of the server.
    • wheaties: This is stupid. Really. Here we are in a world where the companies that own the assets (you know, the things that cost a lot of money) are worth less than the things that don't own anything. This doesn't seem "right" or "fair" in the sense that Priceline should be a middleman, unable to exercise any or all pricing power because it does not control the assets producing the revenue. I wonder how long this can last?
    • platz: Apparently deep-learning and algae are the same thing.
    • @CompSciFact: "If you don't run experiments before you start designing a new system, your entire system will be an experiment." -- Mike Williams
    • Scott Aaronson: our laws of physics are structured in such a way that even pure information often has “nowhere to hide”: if the bits are there at all in the abstract machinery of the world, then they’re forced to pipe up and have a measurable effect. 
    • The Internet said many more interesting things this week. To read them all please click through to the full article.

  • Cool interview with Margaret Hamilton--NASA's First Software Engineer--on Makers. Programmers, you'll love this. One of the stories she tells is how her daughter was playing around and selected the prelaunch program during flight. That crashed the simulator. So like a good programmer she wanted to prevent this from happening. She tried to get a protection put in because an astronaut could actually do this during flight. Management would certainly allow this, right? She was denied. They said astronauts are trained never to make a mistake so it could never happen. Eventually she won the argument and was able to add code to protect against human error. So little has changed :-)

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...


          Stuff The Internet Says On Scalability For July 21st, 2017        

Hey, it's HighScalability time:

Afraid of AI? Fire ants have sticky pads so they can form rafts, build towers, cross streams, & order takeout. We can CRISPR these guys to fight Skynet. (video, video, paper)

If you like this sort of Stuff then please support me on Patreon.

 

  • 222x: Bitcoin less efficient than a physical system of metal coins and paper/fabric/plastic; #1: Python use amongst Spectrum readers; 3x: time spent in apps that don't make us happy; 1 million: DigitalOcean users; 11.6 million: barrels of oil a day saved via tech and BigData; 200,000: cores on Cray super computer;$200B: games software/hardware revenue by 2021; $3K: for 50 Teraflops AMD Vega Deep Learning Box; 24.4 Gigawatts: China New Solar In First Half Of 2017; 

  • Quotable Quotes:
    • sidlls: I think instead there is a category error being made: that CS is an appropriate degree (on its own) to become a software engineer. It's like suggesting a BS in Physics qualifies somebody to work as an engineer building a satellite.
    • Elon Musk: AI is a fundamental existential risk for human civilization, and I don’t think people fully appreciate that
    • Mike Elgan: Thanks to machine learning, it's now possible to create a million different sensors in software using only one actual sensor -- the camera.
    • Amin Vahdat (Google): The Internet is no longer about just finding a path, any path, between a pair of servers, but actually taking advantage of the rich connectivity to deliver the highest levels of availability, the best performance, the lowest latency. Knowing this, how you would design protocols is now qualitatively shifted away from pairwise decisions to more global views.
    • naasking: You overestimate AI. Incompleteness is everywhere in CS. Overcoming these limitations is not trivial at all.
    • 451: Research believes serverless is poised to undergo a round of price cutting this year.
    • Nicholas Bloom: We found massive, massive improvement in performance—a 13% improvement in performance from people working at home
    • @CoolSWEng: "A Java new operation almost guarantees a cache miss. Get rid of them and you'll get C-like performance." - @cliff_click #jcrete
    • DarkNetMarkets: We're literally funding our own investigation. 
    • Tristan Harris: By shaping the menus we pick from, technology hijacks the way we perceive our choices and replaces them with new ones. But the closer we pay attention to the options we’re given, the more we’ll notice when they don’t actually align with our true needs.
    • xvaier: If I have one thing to tell anyone who is looking for business ideas to try out their new programming skills on, I strongly suggest taking the time to learn as much as possible about the people to whom you want to provide a solution, then recruiting one of them to help you build it, lest you become another project that solves a non-issue beautifully.
    • @sebgoa: Folks, there were schedulers before kubernetes. Let's get back down to earth quickly
    • Mark Shead: A finite state machine is a mathematical abstraction used to design algorithms. In simple terms, a state machine will read a series of inputs. When it reads an input it will switch to a different state. Each state specifies which state to switch for a given input. This sounds complicated but it is really quite simple.
    • xantrel: I started a small business that started to grow, I thought I had to migrate to AWS and increase my cost by 5xs eventually, but so far Digital Ocean with their hosted products and block storage has handled the load amazingly well.
    • danluu: when I’m asked to look at a cache related performance bug, it’s usually due to the kind of thing we just talked about: conflict misses that prevent us from using our full cache effectively6. This isn’t the only way for that to happen – bank conflicts and and false dependencies are also common problems
    • Charles Hoskinson: People say ICOs (Initial Coin Offering) are great for Ethereum because, look at the price, but it’s a ticking time-bomb. There’s an over-tokenization of things as companies are issuing tokens when the same tasks can be achieved with existing blockchains. People are blinded by fast and easy money.
    • Charles Schwab: There don't seem to be any classic bubbles near bursting at the moment—at least not among the ones most commonly referenced as potential candidates.
    • Sertac Karaman: We are finding that this new approach to programming robots, which involves thinking about hardware and algorithms jointly, is key to scaling them down.
    • Michael Elling: When do people wake up and say that we’ve moved full circle back to something that looks like the hierarchy of the old PSTN? Just like the circularity of processing, no?
    • Benedict Evans: Content and access to content was a strategic lever for technology. I’m not sure how much this is true anymore.  Music and books don’t matter much to tech anymore, and TV probably won’t matter much either. 
    • SeaChangeViaExascaleOnDown: Currently systems are still based around mostly separately packaged processor elements(CPUs, GPUs, and other) processors but there will be an evolution towards putting all these separate processors on MCMs or Silicon Interposers, with silicon interposers able to have the maximum amount of parallel traces(And added active circuitry) over any other technology.
    • BoiledCabbage: Call me naive, but am I the only one who looks at mining as one of the worst inventions for consuming energy possible?
    • Amin Vahdat (Google):  Putting it differently, a lot of software has been written to assume slow networks. That means if you make the network a lot faster, in many cases the software can’t take advantage of it because the software becomes the bottleneck.

  • Dropbox has 1.3 million lines of Go code, 500 million users, 500 petabytes of user data, 200,000 business customers, and a multi-exabyte Go storage system. Go Reliability and Durability at Dropbox. They use it for: RAT: rate limiting and throttling; HAT: memcached replacement; AFS: file system to replace global Zookeeper; Edgestore: distributed database; Bolt: for messaging; DBmanager: for automation and monitoring of Dropbox’s 6,000+ databases; “Jetstream”, “Telescope”, block routing, and many more. The good: Go is productive, easy to write and consume services, good standard library, good debugging tools. The less good: dealing with race conditions.

  • Professor Jordi Puig-Suari talks about the invention of CubeSat on embedded.fm. 195: A BUNCH OF SPUTNIKS. Fascinating story of how thinking different created a new satellite industry. The project wasn't on anyone's technology roadmap, nobody knew they needed it, it just happened. A bunch of really bright students, in a highly constrained environment, didn't have enough resources to do anything interesting, so they couldn't build spacecraft conventionally. Not knowing what you're doing is an advantage in highly innovative environments. The students took more risk and eliminated redundancies. One battery. One radio. Taking a risk that things can go wrong. They looked for the highest performance components they could find, these were commercial off the shelf components that when launched into space actually worked. The mainline space industry couldn't take these sort of risks. Industry started paying attention because the higher performing, lower cost components, even with the higher risk, changed the value proposition completely. You can make it up with numbers. You can launch 50 satellites for the cost of one traditional satellite. Sound familiar? Cloud computing is based on this same insight. Modern datacenters have been created on commodity parts and how low cost miniaturized parts driven by smartphones have created whole new industries. CubeSats' had a standard size, so launch vehicles could standardize also, it didn't matter where the satellites came from, they could be launched. Sound familiar? This is the modularization of the satellite launching, the same force that drives all mass commercialization. Now the same ideas are being applied to bigger and bigger spacecraft. It's now a vibrant industry. Learning happens more quickly because they get to fly more. Sound familiar? Agile, iterative software development is the dominant methodology today. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...


          A shiny new look, improved (Open Source) docs, and touch down in Singapore        

Picture the scene. It’s late at night. You’re surrounded by takeout boxes and coffee cups. You’re not sure when it got dark and you really need the bathroom, but just… 10.. minutes… more… perfecting your distributed blockchain deep-learning-powered todo list app (that also controls the weather). Sound familiar? We get it: we’ve been there too. […]

The post A shiny new look, improved (Open Source) docs, and touch down in Singapore appeared first on Tyk API Gateway and API Management.


          Nuit Blanche in Review (July 2017)        
Since the last Nuit Blanche in Review (June 2017), it was found that Titan had interesting chemistry. On Nuit Blanche, on the other hand, we had four implementations released by their authors, several interesting in-depth articles (some of them related to SGD and Hardware) . We had several slides and videos of meetings and schools and three job offering. Enjoy !


In-depth

SGD related

CS/ML Hardware


Slides

Videos

Job:

Other 


Credit: Northern Summer on Titan, NASA/JPL-Caltech/Space Science Institute


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Slides: Deep Learning and Reinforcement Learning Summer School 2017 @ MILA Montreal, Canada        
The Deep Learning and Reinforcement Learning Summer School 2017 just finished and here are some of the slides presented there (videos should be coming later) 



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Intel puts Movidius AI tech on a $79 USB stick        

Last year, Movidius announced its Fathom Neural Compute Stick — a USB thumb drive that makes its image-based deep learning capabilities super accessible. But then in September of last year, Intel bought Movidius, delaying the expected winter rollout of Fathom. However, Intel has announced that the deep neural network processing stick is now available and going by its new name, the Movidius Neural Compute Stick. "Designed for product developers, researchers and makers, the Movidius Neural Compute Stick aims to reduce barriers to developing, tuning and deploying AI applications by delivering dedicated high-performance deep-neural network processing in a small form factor," said Intel in a statement.

Source: Intel


          Novedades en Data & Analytics–Octubre 2016        
Llevamos meses de mucho movimiento en el mundo de los datos y estas semanas tenemos bastantes novedades. Os dejo unas cuantas noticias interesantes: Microsoft releases beta of Microsoft Cognitive Toolkit for deep learning advanceshttps://blogs.microsoft.com/next/2016/10/25/microsoft-releases-beta-microsoft-cognitive-toolkit-deep-learning-advances/#sm.000b3fe8i16tbcz6116yzp98ytydw Announcing Azure Analysis Services preview https://azure.microsoft.com/en-us/blog/introducing-azure-analysis-services-preview/ SQL Server 2016 Traininghttps://www.microsoft.com/en-us/download/details.aspx?id=54089 Un Saludo
          IBM speeds deep learning by using multiple servers        

For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers -- not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It's available only in IBM's PowerAI 4.0 software package, which runs exclusively on IBM's own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn't require developers to learn an entirely new deep learning framework. It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

To read this article in full or to leave a comment, please click here


          Comment on Deep learning algorithms generate stock market returns in the double digits from 1992 to 2015 by Financial – Exponential-Technology.com        
[…] Source: www.innovationtoronto.com/2017/03/deep-learning-algorithms-generate-stock-market-returns-in-the-doub… […]
          AI Now Comes in a USB Stick        
By Phil Goldstein

Intel’s Movidius Neural Compute Stick can deliver artificial intelligence and deep-learning capabilities to entrepreneurs, product developers and tinkerers. 


          Embodied Cognition        
The Deep Mind of Demis Hassabis - "The big thing is what we call transfer learning. You've mastered one domain of things, how do you abstract that into something that's almost like a library of knowledge that you can now usefully apply in a new domain? That's the key to general knowledge. At the moment, we are good at processing perceptual information and then picking an action based on that. But when it goes to the next level, the concept level, nobody has been able to do that." (previously: 1,2) also btw... -The next big frontier is the mind and brain -Demis Hassabis on Computational Neuroscience -Systems neuroscience and AGI -Neural Networks and Deep Learning -Google Search Will Be Your Next Brain
          271 RR Problems New Developers Don’t Realize They Have and Hidden Tradeoffs to Coding Decisions Developers Have to Make with Justin Weiss        

Rails Remote Conf

 

01:14 - Justin Weiss Introduction

02:15 - “Learning Rails Without Getting Overwhelmed”?

02:34 - Problems New Developers Don’t Realize They Have

04:35 - Learning New Things

08:05 - What is a success?

09:02 - What can senior devs do? What shouldn’t they do?

15:43 - Are there still “Architects”?

20:45 - The Existential Crisis of Software Development

22:26 - The Responsibility of the Students

26:08 - How can new developers obtain objective evidence of their blind spots?

33:49 - Early Career Developers Working Together

37:03 - Learning Practices

 

Picks


          ì¸ê³µì§€ëŠ¥ì´ 만든 작품이 예술로 인정받을 수 있을까?        

인간과 컴퓨터를 구분하기 위해 튜링이 제안한 튜링 테스트. 예술작품을 위한 튜링 테스트가 존재한다면, 인공지능은 그 튜링 테스트를 통과할 수 있을까요?


최근 러트거스 대학교의 예술&인공지능 연구소 소속 Ahmed Elgammal 연구팀은 인공지능으로 만든 창의적인 예술작품들이 몇몇 현대 미술작품을 넘어섰다는 결과를 논문으로 발표했습니다.


인공지능이 만든 작품이 예술로 인정받을 수 있을까요?(튜링 테스트 사진)



튜링 테스트는 기계가 인간과 얼마나 비슷한 수준으로 대화할 수 있는지를 통해 기계의 지능 여부를 판별하는 테스트입니다. 앨런 튜링이 만든 이 테스트는 인공지능의 ‘인간다움’에 대한 첫 기준을 제시한 것으로 유명합니다. 튜링은 인간이 보기에 사람과 같은 수준으로 대화하는 인공지능을 인간에 준하는 지능이 있다고 간주하였습니다.



인공신경망 CAN을 이용하여 만든 작품들



딥러닝의 발달로 알파고가 바둑세계를 제패하고, 인공신경망을 이용한 구글 번역기가 수많은 번역을 해주고 있는 현 시대에 인공지능은 예술의 분야, 창의성까지 넘보고 있습니다. 한 예로 어떤 사진이든 ë°˜ 고흐의 “별이 빛나는 밤” 작품과 비슷한 양식의 회화로 바꾸어주는 알고리즘을 들 수 있습니다. 이러한 예는 예술적인 스타일에 대한 고찰을 불러일으키긴 하지만 창의성이라 보기엔 부족함이 많습니다.


그렇다면 과연 이런 인공지능에게서 인간만이 가질 수 있는 ‘창의성’을 기대해 볼 수 있을까요? 또, 창의성을 판별할 수 있는 새로운 튜링 테스트를 만들어 인공지능의 창의성을 정량적으로 측정해볼 수 있을까요?



 Ahmed Elgammal은 미술 작품을 창작할 수 있는 알고리즘 CAN(Creative Adversarial Network)를 만들었습니다. 이는 최근 각광받고 있는 신경망 중 하나인 GAN(generative Adversarial Network)를 응용하여 만든 인공 신경망입니다. ê¸°ì¡´ì˜ 머신러닝들은 개 사진을 주고 ‘이것은 개이다’라는 정보를 주어 학습을 시켰다면, GAN은 두 개의 인공신경망을 두어 서로 경쟁을 통해 학습을 시키고 직접 '가짜' 정보를 만들어내는 새로운 구조의 인공신경망으로, 작년부터 가장 각광받고 있는 인공신경망 구조입니다. 


에펠탑 사진을 반 고흐의 "별이 빛나는 밤" 작품과 비슷하게 만들어 주는 알고리즘


GAN의 첫 번째 인공신경망은 기존의 인공신경망처럼 사진을 인식하고, 이 사진을 분류하는 역할을 합니다. 다른 두 번째 인공신경망은 무작위적으로 최대한 진짜 같은 샘플을 만들어 냅니다. 이렇게 만들어진 샘플을 첫 번째 인공신경망이 기존 샘플의 스타일과 비슷한지 여부를 판별해 비슷할 경우 결과물로 선정합니다. 즉, 두 번째 인공신경망은 최대한 진짜 같은 샘플을 만들기 위해 노력하고, 첫 번째 인공신경망은 만들어진 샘플을 구분하기 위해 노력하게 되므로 서로를 속이기 위해 점점 발전하게 되는 것입니다.

GAN의 작동 원리. 첫 번째 인공신경망이 Generator역할을, 두 번째 인공신경망이 Discriminator 역할을 합니다.


Ahmed Elgammal이 만든 CAN은 이를 좀 더 발전시켜 기존 작품과는 차별성을 두어 참신함을 꾀하되, 큰 틀에서는 작품의 범주 안에 들어갈 수 있도록 하여 예술작품을 창작합니다. 여기서 연구팀은 많은 미술사조 중 극사실주의 등과 달리 사람의 창의력이 더욱 돋보이는 추상표현주의를 선택하여 인공지능의 한계를 시험해 볼 수 있도록 하였습니다. Elgammal은 “이 작품들의 목표는 기존의 것에서 볼 수 없는 새로운 작품이지만 그렇다고 사람들에게 혐오감을 불러일으키지 않는 것”이라며 “새롭지만, 너무 새롭진 않은” 작품을 만드는 것을 목표로 하였다고 합니다.


 

CAN의 작동 원리



 ê·¸ ê²°ê³¼ 사람들은 실제 2016년 스위스 바젤에서 열린 Contemporary-Art Fair에 전시된 현대 미술 작품보다 CAN을 이용해 만든 미술 작품에 더 높은 점수를 주었고, 실제로 잘 구분하지 못했습니다. 어떻게 ë³´ë©´ 이런 결과는 인공지능이 만든 미술 작품이 현대 미술이 말하는 “창의력”을 만족했다고도 ë³¼ 수 있지만, 연구팀은 “인간이 만든 작품과 CAN을 이용하여 만든 미술작품을 다른 관점에서 평가해 ë³¼ 것”이라며 다른 해석의 여지를 남겼습니다. 


CAN을 이용하여 만든 추상표현주의 작품. 모두 높은 평가를 받은 작품들입니다.


이번 연구 결과 이후, 인공지능이 제작한 미술 작품들의 창의성을 어떻게 판별해야 할 것인가에 대해서 앞으로 좀 더 뜨거운 논쟁이 필요할 것입니다. 튜링 테스트에서 인간과 비슷한 수준으로 대화할 수 있는 컴퓨터를 튜링 테스트의 기준으로 잡았듯이, 창의성에 대해 철학적인 정의가 만들어지고, 이에 따라 인공지능의 예술성을 판단해야 할지도 모릅니다. 아니면 예술 작품의 정의를 오직 인간에 의해서만 만들어진 작품이라 정의해야 할까요? 어떤 기준이 내려지던 간에 Elgammal의 연구 결과가 예술성과 창의성에 대한 경계선을 모호하게 한 것은 확실해 보입니다.


참고 기사 


Machine Creativity Beats Some Modern Art

저작자 표시 비영리 변경 금지

          Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus        
In my last blog post, I outlined a few interesting results from a word2wec model trained on half a million news documents. This was pleasantly met with some positive reactions, some of which not necessarily due to the scientific rigour of the report but due to awareness effect of such "populist treatment of the subject" on the community. On the other hand, there were more than some negative reactions. Some believing I was "cherry-picking" and reporting only a handful of interesting results out of an ocean of mediocre performances. Others rejecting my claim that training on a small dataset in any language can produce very encouraging results. And yet others literally threatening me so that I would release the code despite I reiterating the code is small and not the point.

Am I the only one here thinking word2vec is freaking awesome?!

So I am back. And this time I have trained the model on a very small corpus of Rock artists obtained from Wikipedia, as part of my Rock History project. And I have built an API on top of the model so that you could play with the model and try out different combinations to your heart's content - [but please be easy on the API it is a small instance only] :) strictly no bots. And that's not all: I am releasing the code and the dataset (which is only 36K Wiki entries).

But now, my turn to RANT for a few paragraphs.

First of all, quantification of the performance of an unsupervised learning algo in a highly subjective field is very hard, time-consuming and potentially non-repeatable. Google in their latest paper on seq2seq had to resort to reporting mainly man-machine conversations. I feel in these subjects crowdsourcing the quantification is probably the best approach. Hence you would help by giving a rough accuracy score according to your experience.


On the other hand, sorry, those who were expecting to see a formal paper - perhaps in laTex format - you completely missed the point. As others said, there are plenty of hardcode papers out there, feel free to knock yourselves down. My point was to evangelise to a much wider audience. And, if you liked what you saw, go and try it for yourself.

Finally, alluding to "cognition" turned a lot of eyebrows but as Nando de Freitas puts it when asked about intelligence, whenever we build an intelligent machine, we will look at it as bogus not containing the "real intelligence" and we will discard it as not AI. So the world of Artifical Intelligence is a world of moving targets essentially because intelligence has been very difficult to define.

For me, word2vec is a breath of fresh air in a world of arbitrary, highly engineered and complex NLP algorithms which can breach the gap forming a meaningful relationship between tokens of your corpus. And I feel it is more a tool enhancing other algorithms rather than the end product. But even on its own, it generates fascinating results. For example in this tiny corpus, it was not only able to find the match between the name of the artists, but it can successfully find matches between similar bands - able to be used it as a Recommender system. And then, even adding the vector of artists generates interesting fusion genres which tend to correspond to real bands influenced by them.

API

BEWARE: Tokens are case-sensitive. So u2 and U2 not the same.

The API is basically a simple RESTful flask on top of the model:
http://localhost:5000/api/v1/rock/similar?pos=<pos>&neg=<neg>
where pos and neg are comma separated list of zero to many 'phrases' (pos for similar, and neg for opposite) - that are English words, or multi-word tokens including name of the bands or phrases that have a Wiki entry (such as albums or songs) - list if which can be found here .
For example:
http://localhost:5000/api/v1/rock/similar?pos=Captain%20Beefheart


You can add vectors of words, for example to mix genres:
http://localhost:5000/api/v1/rock/similar?pos=Daft%20Punk,Tool&min_freq=50
or add an artist with an adjective for example a softer Bob Dylan:
http://localhost:5000/api/v1/rock/similar?pos=Bob%20Dylan,soft&min_freq=50
Or subtract:
http://localhost:5000/api/v1/rock/similar?pos=Bob%20Dylan&neg=U2
But the tokens do not have to be a band name or artist names:
http://localhost:5000/api/v1/rock/similar?pos=drug
If you pass a non-existent or misspelling (it is case-sensitive!) of a name or word, you will get an error:
http://localhost:5000/api/v1/rock/similar?pos=radiohead

{
result: "Not in vocab: radiohead"
}

You may pass minimum frequency of the word in the corpus to filter the output to remove the noice:
http://localhost:5000/api/v1/rock/similar?pos=Daft%20Punk,Tool&min_freq=50

Code

The code on github as I said is tiny. Perhaps the most complex part of the code is the Dictionary Tokenisation which is one of the tools I have built to tokenise the text without breaking multi-word phrases and I have found it very useful allowing to produce much more meaningful results.

The code is shared under MIT license.

To build the model, uncomment the line in wiki_rock_train.py, specifying the location of corpus:

train_and_save('data/wiki_rock_multiword_dic.txt', 'data/stop-words-english1.txt', '<THE_LOCATION>/wiki_rock_corpus/*.txt')

Dataset

As mentioned earlier, dataset/corpus is the text from 36K Rock music artist entries on the Wikipedia. This list was obtained by scraping the links from the "List of rock genres". Dataset can be downloaded from here. For information on the Copyright of the Wikipedia text and its terms of use please see here.
          What is Artificial Intelligence, Machine Learning, and Deep Learning?        

Crossposted from ingomierswa.com.   There is hardly a day where there is no news on artificial intelligence in the media.  Below is a short collection of some news headlines from the past 24 hours only: Artificial Intelligence Comes to Hollywood – Is your job safe? This robot explains why you shouldn’t worry about artificial intelligence […]

The post What is Artificial Intelligence, Machine Learning, and Deep Learning? appeared first on RapidMiner.


          Google Explains Machine Learning And Deep Learning; Plus: Short Takes From Educause 2016        
  Machine Learning is an important concept in computer science and for higher education in general that is developing rapidly....

Read More


          Self-Driving Deep Learning with Lex Fridman        
Self-driving cars are here. Fully autonomous systems like Waymo are being piloted in less complex circumstances. Human-in-the-loop systems like Tesla Autopilot navigate drivers when it is safe to do so, and lets the human take control in ambiguous circumstances. Computers are great at memorization, but not yet great at reasoning. We cannot enumerate to a computer every single circumstance that a car might find itself in. The computer needs to

Continue reading...


          Distributed Deep Learning with Will Constable        
Deep learning allows engineers to build models that can make decisions based on training data. These models improve over time using stochastic gradient descent. When a model gets big enough, the training must be broken up across multiple machines. Two strategies for doing this are “model parallelism” which divides the model across machines and “data parallelism” which divides the data across multiple copies of the model. Distributed deep learning brings

Continue reading...


          Data Scientist จาก True สร้างระบบตัดคำแบบ Deep Learning ด้วย Keras เปิดซอร์สแบบ MIT        

คุณรักพงษ์ กิตตินราดร และคุณกรกฎ เชาวะวณิช Data Scientist จาก True Corporation เปิดซอร์สโครงการ deepcut ระบบตัดคำแบบ deep learning โดยพัฒนาด้วย Keras

ข้อมูลที่ใช้เทรนเป็นชุดข้อมูล BEST ของ NECTEC โดยแบ่งข้อมูลสำหรับฝึก 90% และข้อมูลสำหรับทดสอบอีก 10% โมเดลพยายามระบุว่าแต่ละตัวอักษรเป็นจุดเริ่มต้นของคำหรือไม่ (ตามโค้ดคือค่ามากกว่า 0.5) โดยเมื่อทดสอบกับข้อมูลทดสอบได้ความแม่นยำ f1 score 98.8%, precision score 98.6%, และ recall score 99.1%

ผมทดสอบดูเทียบกับ libthai ที่ใช้งานกันในลินุกซ์ในภาพท้ายข่าว โดยใช้ประโยคตัวอย่าง "คุณบ็อตบอกว่าวันนี้พิมพ์ไม่ผิดแต่ตัดแบบนี้จะดีเหรอ คณะกรรมการการเลือกตั้งกรมวิทยาศาสตร์การแพทย์ เขานอนตากลมตากลมไปมา"

ที่มา - GitHub:rkcosmos/deepcut, Thailand Deep Learning

ผลจาก deepcut

alt="upic.me"

ผลจาก libthai

alt="upic.me"


          Deep Learning - Unreasonably Effective        
In this podcast, Stephen Jones from Nvidia presents: Deep Learning - Unreasonably Effective.

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. At the 2015 GPU Technology Conference, you can join the experts who are making groundbreaking improvements in a variety of deep learning applications, including image classification, video analytics, speech recognition, and natural language processing.
Watch the video presentation: http://wp.me/p3RLHQ-dTb

          Deep Learning on Qubole Using BigDL for Apache Spark – Part 2        

In Part 1 you learned how to get started with installing distributed deep learning library BigDL on Qubole. In this Part 2 of a two-part…

The post Deep Learning on Qubole Using BigDL for Apache Spark – Part 2 appeared first on Qubole.


          Deep Learning on Qubole Using BigDL for Apache Spark – Part 1        

BigDL runs natively on Apache Spark, and because Qubole offers a greatly enhanced and optimized Spark as a service, it makes for a perfect deployment…

The post Deep Learning on Qubole Using BigDL for Apache Spark – Part 1 appeared first on Qubole.


          GrrCON 2016 Videos         
Link:http://www.irongeek.com/i.php?page=videos/grrcon2016/mainlist
These are the videos of the presentations from GrrCON 2016. Big thanks to EggDropX and Jaime for having me out, and my video crew  (Chris, Erick, & Cooper) for recording.

Thieves

Act Three, The Evolution of Privacy
Finux

Weaponizing Nanotechnology and hacking humans; defining the boundaries
Chris Roberts

Becoming a Cyborg: The First Step Into Implantable Technology
Michael Vieau

Abnormal Behavior Detection in Large Environments
Dave Kennedy

Secure Dicks
Michael Kemp

and bad mistakes I've made a few...
Jayson Street (Only first 30 min)

Predator to Prey: Tracking Criminals with Trojans and Data Mining for Fun and Profit
Ken Westin

Guarding Dinner
J Wolfgang Goerlich

Back to the Future: Understanding our future but following the past
Kevin Johnson

Breaking Android Apps for Fun and Profit
Bill Sempf

Attacking the Hospitality and Gaming Industries: Tracking an Attacker Around the World in 7 Years
Matt Bromiley & Preston Lewis

Security Guards -- LOL! Brent White & Tim Roberts

Pirates

Internet of Things (IoT) radio frequency (RF) Analysis With Software Defined Radio
Kevin Bong

So You Want to Be a Pentester
Absolute0x0

What do you mean I'm pwn'd! I turned on automatic updates!
Scott Thomas & Jeff Baruth

Surreal Paradigms: Automotive Culture Crash
D0xt0r Z3r0

Reversing and Exploiting Embedded Devices (Walking the software and hardware stack)
Elvis Collado

Threat Detection & Response with Hipara
J. Brett Cunningham

Still Broken After All These Years Aka Utility Security For Smarties
Doug Nibbelink

Threat Detection Response with Hipara
J Brett Cunningham

Quick and Easy Windows Timelines with Pyhon, MySQL, and Shell Scripting
Dr. Phil Polstra

Cruise Ship Pentesting OR Hacking the High Seas
Chad M. Dewey

Using Virus Total Intelligence to track the latest Phishing Document campaigns
Wyatt Roersma

Encryption, Mobility & Cloud Oh My!
Bill Harmer

Magnetic Stripes 101
Tyler Keeton

Machine Duping: Pwning Deep Learning Systems
Clarence Chio

Money, Fame, Power - Build your success as a security professional
Nathan Dragun

Tales from the Crypt...(analyst)
Jeff Man

What's in your Top Ten? Intelligent Application Security Prioritization
Tony Miller

Binary Ninja
Jared Demott

Phish your employees for fun!
Kristoffer Marshall

Mad Scientists

Securing Trust - Defending Against Next-generation Attacks
John Muirhead-Gould

Five Nights At Freddys: What We Can Learn About Security From Possessed Bears
Nick Jacob

Make STEHM Great Again
David "HealWHans" Schwartzberg

Pentester-to-customer:I will 0wn your network! - Customer-to-pentester:No, I will make you cry!
David Fletcher & Sally Vandeven

How Do You Secure What You Don't Control
Dimitri Vlachos

Fighting the Enemy Within
Matt Crowe

Getting to the Root of Advanced Threats Before Impact
Josh Fazio

Reality-Checking Your AppSec Program
Darren Meyer

How to Implement Crypto Poorly
Sean Cassidy

Stop attacking your mother's car!
Charles Parker, II

Contracting: Privacy Security and 3rd Party
Nathan Steed & Kenneth Coleman

Alignment of business and IT Security
Shane Harsch

So You've Inherited a Security Department, Now What?
Amanda Berlin

Piercing the Air Gap: Network Steganography for Everyone
John Ventura

On being an Eeyore in Infosec
Stefan Edwards

Welcome to The World of Yesterday, Tomorrow!
Joel Cardella

Board Breaking


          PowerAI Revolutionizes Deep Learning (Again!) with Release 4        

I’m excited to share with you that IBM has just released PowerAI release 4 which includes a technology preview of the record breaking Distributed Deep Learning technology we announced earlier this week. Drawing on IBM’s deep expertise in AI, in high-performance computing and system design, we have announced breakthrough results in both accuracy and performance in Deep Learning this week.

Using a cluster of 64 IBM advanced Deep Learning Power servers with 256 GPUs, we demonstrated a new speed record for training today’s most advanced neural networks in 50 minutes, a significant improvement over the hour-long training time reported by Facebook last month. At the same time, this work also significantly boosted neural network accuracy by over 13% for networks trained on very large data sets with over 7.5 million images, improving accuracy to 33.8% from the previous best results at 29.8% published by Microsoft (https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf).

Accelerating the training of Deep Neural Networks (DNNs) is not an idle competition but has direct impact on how DNNs can be applied to real-world problems. High-speed training enables AI developers and data scientists to develop better neural models for their applications by interactively exploring and optimizing DNN architectures. Thus, speed records are not about idle competition, but about empowering technology end users to find better solutions. Underscoring that dual commitment to advancing both the state of the art in AI and making the technology available to its users immediately, IBM is the only AI leader making the new technological advances available to all users concurrently with announcing the breakthrough by releasing them as a technology preview as part of PowerAI.

The advances were obtained by applying IBM’s deep expertise in system design and high-performance computing to deep learning with a close collaboration between the IBM Research and product divisions, combining PowerAI software, deep learning servers and excellence in high-performance computing research.

Drawing on IBM’s deep decade-long experience expertise in high-performance parallel systems, the Distributed Deep Learning framework achieves unprecedented scaling efficiency of 95%, ensuring the computing resources are efficiently used (https://www.ibm.com/blogs/research/2017/08/distributed-deep-learning/).

AI users can use the technologies that will power the world’s  fastest CORAL supercomputers with the CORAL supercomputer at the US national laboratories (https://openpowerfoundation.org/press-releases/department-of-energy-awards-425-million-for-next-generation-supercomputing-technologies/) to enhance the quality and speed of deep learning applications today.  Advances like these are only possible in an open standard-based environment that brings together the industry’s best technologies: IBM’s deep learning servers and CORAL technologies are created in the Open POWER ecosystem for collaborative innovation, together the IBM’s advanced Power system designs, Nvidia’s GPU numerical accelerators and Mellanox high-performance networking. 

The advances were obtained with the PowerAI for transforming scientific research and businesses with AI technologies which provides a stable, compatible environment for delivering breakthrough innovations. The Distributed Deep Learning technology is available today to PowerAI users with a technology preview for Caffe and TensorFlow in PowerAI Release 4 which is available for free download at ibm.biz/powerai today.


          Intel Democratizes Deep Learning Application Development with Launch of Movidius Neural Compute Stick        

Today, Intel launched the Movidius™ Neural Compute Stick, the world’s first USB-based deep learning inference kit and self-contained artificial intelligence (AI) accelerator that delivers dedicated deep neural network processing capabilities to a wide range of host devices at the edge. Designed for product developers, researchers and makers, the Movidius Neural Compute Stick aims to reduce … Continued

The post Intel Democratizes Deep Learning Application Development with Launch of Movidius Neural Compute Stick appeared first on Intel Newsroom.


          After PASS Summit 2016 Recap (As seen by me!)        

In my last blog entry, I promised to blog about the PASS Summit each night when I got back to the room. This was a failure for two reasons. 1. I was always out at night and then exhausted. 2. I forgot the keyboard to my Surface Pro. I tweeted about it, and was picked on by the @surface twitter account:

image

But I did tweet about the event rather frequently, as it is much easier to capture your ideas and comments in 140 character spurts (and favoriting other posts saves typing too.) If you want to read all of the tweets about the summit, look for the #sqlsummit hashtag on Twitter.

The first day was the Microsoft Keynote. It was led by Joseph Sirosh and while there wasn't a tremendous amount of stuff that directly excites this relational engine programmer (though love was shown for how awesome the SQL Server Engine is, both on prem and in the cloud), some of the stuff that was shown was really cool:

1. Showing the various sources of data you can use with Polybase

2. SQL Server on Linux - Not that I will ever use this, but it could be a boon to SQL Server usage as time passes (and for a relational programmer, you would not really notice much of a change anyhow)

3. Azure Analysis Services is coming soon

4. Azure SQL DW has had some tools created to make it easier to get started with (RedGate has a free tool at http://www.red-gate.com/products/azure-development/data-platform-studio/), and as Tim Ford tweets here: (https://twitter.com/sqlagentman/status/791316930703478784), you can get a free month of SQL DW to give it a try.

The biggest takeaway was just how much data is going to affect our lives as time passes. Last year, my reaction was that the keynote was a bit creepy, taking mapping DNA and predicting health. This year, it was a couple of examples that were really cool, including some apps, websites, a few game examples, and sentiment analysis of the book War and Peace (https://twitter.com/drsql/status/791320039303491584) by Julie Koesmarno.

An interesting turn of technology was the push towards "intelligence database" platforms. Something that many of my colleagues have discussed for years has been to leverage the data tier to get work done faster, and more reliably. What had always been missing in those scenarios has been scaling out. Hence we were constantly limited to how much we could do on a single computer. Two things have changed since those days. 1. A single computer can do as much work as most organizations need to. 2. Scaling out is far easier when dealing with read intensive scenarios. There was a demo of SQL Server 2016 handling millions of transactions where the reality was orders of magnitude lighter (and we are talking fraud detection for major credit card companies).

However, the most moving demo finished out the keynote, and it was the closest to creeped out that I got. There was a computer guessing ages, (I think) gender, etc. Then the computer was describing the surroundings. The the computer was reading a menu at a restaurant. And then you realize this was a computer helping a blind man. Wow. That was just an amazing use of technology.

If you want to know what Joseph Sirosh (the Corp VP for the Data Group at Microsoft) felt were the top five announcements, he shared it here: https://twitter.com/josephsirosh/status/790950683138596865. Stuff I didn't mention was really outside of what I know (ok, I admit it, care) about (I do only have so much time!)

-------------

After this I attended several pretty great sessions:

  • Why Datatype Choice Matters from Andy Yun, where he covered some of the internals of datatypes. The best part for me was the statement that "NULL isn't a value, it is a state of being unknown, undefined.  Hence the null bitmap in the physical record of a row." While I have written about NULL probably hundreds of times, it is good to be reminded of this point, that NULL isn't really a value, even though it does feel like it.
  • Building an SSRS monitoring system with Stacia Varga (a cowriter on this book). She covered a lot of stuff about logging that I may never use, but one thing I learned about that I might directly use is logman.exe, which lets you capture perfmon counters. There is an article here about capturing SSRS statistics: https://msdn.microsoft.com/en-us/library/ms159809.aspx).
  • Then Tom LaRock and Karen Lopez duked it out again talking about time bombs you have lurking in your database code. You know like NULLs no one understands, identity column values that no one pays attention to when the values run out.

----------------

Something I am keen to learn more about came in two sessions: Buck Woody the first day and Dr Cecilia Aragon. Data Science. I don't know if I could, or would want to, become a data scientist. But in either case it leads me down the path of wanting to make sure that databases I create are ready to be a source of some of that data. I have always been a proponent of tailoring my OLTP database designs to capturing every detail that is possible. For example, cause an effect, when it is direct (such as a shipment to an order), or indirect, (such as a follow-on order that the customer tells you, or gets in a link to, a previous order.)  Data Science is about learning more about everything, and the more answers you can provide an algorithm, that can only help you see others behaving the same way.  Capturing additional data that isn't needed immediately is not always something that is greeted by developers with a hearty smile, but it is almost always going to be useful.

Buck Woody pointed out a website (http://tylervigen.com/spurious-correlations) that has some excellent, messed up, correlations that you can make using data. Such as "Per capita consumption of chicken" and "Total US crude oil imports':

image

I eat a lot of hot chicken to try to help, but I am only one person!  These correlation were highlighted even more by Dr Aragon, who had a couple of very interesting quotes that piqued my interest:

"Data science is driven more by intellectual ecosystems and software ecosystems than by hardware"

(Paraphrasing)"Humans not gaining exponentially greater cognitive capacity. "

"Big data is data that is 2 orders of magnitude greater than you are accustomed to"

For me, these three quotes really put Data Science in perspective. People are now, and have been, very intelligent, regardless of how things seem at times. But what we really lack is the ability to process concepts quickly. People make algorithms, and could slog through data manually, but rather let computers whip through data and give us decisions. Will there ever be a time where machines make correlations that are completely wrong, but they act on them anyhow? It reminds me of Robot Santa Claus on Futurama who judged everyone naughty, the person who was naughty, and the person who told on the person.

Will we ever make a machine that can come up with algorithms, and understand what is a meaningful correlation without some human logic? Heaven knows that every person who creates a machine won't be good at heart, but could machines ever be machines without people?

It does all remind me of the Pete Townshend song "Man and Machines" from the Iron Man album :

"Man makes machines
To man the machines
That make the machines
That make the machines
Make a machine
To make a machine
And man and machine
Will make a machine
To break the machines
That make the machines..."

On Singlularity Hub I was reading an article about the subject of AI, while it isn't the same thing exactly, has many of the same problems. There was a statement:

"Based on deep neural nets, the AI impressively mastered nostalgic favorites such as Space Invaders and Pong without needing any explicit programming — it simply learned through millions of examples."

If you stop at "without needing any explicit programming", this sounds pretty creepy. But if you give the computer an example of a successful solution, perhaps even millions of them, and combine this with the fact that computers don't make tiny mistakes (you know, what makes games fun!) it isn't that the computer can learn by itself. Just that it can try, fail, adjust, and repeat a LOT faster than people. But it still takes a human to guide the process.

-----------------

The second keynote had two major parts. First was the PASS business stuff. We have more chapters, more members and want orders of magnitude more people. One way of pushing this direction is, much like the MVP program did, including the entire data platform. PASS no longer means Professional Association of SQL Server, but just PASS. New logo too:

image

The little symbols represent what PASS encompasses in who PASS is as an organization, and we as PASS members. Interesting enough, but I always worry that things are going to go sideways and we will end up in a different community of mostly the same people. Time will tell.

The second part was an excellent keynote address by Dr David Dewitte. It had some interesting details and comparisons of online data warehouse products, but was a lot broader than that. Good overview of internal stuff that can only help your career. I won't cover anything about it, go here (http://www.sqlpass.org/summit/2016/PASStv.aspx) and watch it.  Best quote for me: "SQL Server, Best Query Engine". But other companies are doing great stuff too.

----------------

Then I went to a discussion about the way sessions are chosen. PASS choses sessions in a very interesting way, but really I think they do a good job. No matter how you do it, someone's feelings will get hurt unless you use my favorite method for SQL Saturday session choosing. Everyone gets a session if you haven't angered the organizers in some meaningful manner. Best way to anger the organizers: don't show up without a good excuse. Yes, it happens too often. And that, along with harassing others at an event (or multiple events), is something that takes a while to get over. Best way, be apologetic, attend events and don't be a jerk again.

-----------------

The other big thing that happens on the second day is that a group of folks wears kilts to PASS to show support for Women in Technology. This year, I was one of those people. It was not a normal thing for me, and not something I expect to do for a SQL Saturday unless for something special. Want to see the picture. Click this link to see Jamie Wick's tweet of a picture that was taken: https://twitter.com/Jamie_Wick/status/791739875439456256

-----------------

Friday, we woke up to a rather interesting sight for a PASS conference, even more interesting than myself in a kilt. The sun came out:

Attended one more regular session of note: Tim Mitchell's Deep Dive of the SSISDB catalog, where I knew most everything, but using Data Taps to capture intermediate results like you might to a temp table in a SQL Query Batch was very nice. I hope to run a series of blogs about some work I have done with the SSISDB catalog over the next year or so. Another interesting idea, using SSISDB versions for regression testing. Run once, deploy new, run again, compare results, then revert.

The other thing I went to was Speaker Idol, supporting my man Robert @SQLCowbell Verell. We co-lead the Nashville SQL User Group, and it helps us if Robert gets a speaking slot :) Robert was wild-card/runner up of the day (there are three rounds of four, with a final round of four to complete the day), and he did a great job. I really felt nervous for all of the people who participated, because what pressure. I have long sweated the process of speaking, because all of those eyes staring at you, seemingly expecting perfection (actually just expecting to learn a few new bits and pieces.) And here, while you have 10 eye staring at you, this time actually expecting perfection. In the end he didn't win, but he certainly didn't embarass himself, since he made the finals despite having a yellow background for text in SSMS that still is burned into my eyes.

------------------

Then it just sort of ended… No fanfare, just a walk down to the Moore Theatre to catch Ian Anderson do his Jethro Tull rock opera. I hadn't even noticed there being concerts I cared about in the area, and prior to this year I would have never wandered that far from the hotel most nights, but I discovered the ease of Uber while there, which made walking less scary, since I occasionally aggravate my knee when walking as much as I did this week!) While there I ran into Rodney Kidd, who had a lot of great stories about music, walking back from the show (we were both at the Homewood Suites.) Add that to the stories that Pat Phelan shared at breakfast that morning about his cool experiences, and I had a great time even outside of the event.

Well, can't wait until next year!


          AU10TIX Launches New OCR for Difficult to Read ID Documents Using Deep-Learning Algorithms        

AU10TIX's breakthrough Deep Learning OCR (DL-OCR) offers up to 96% success rates in handling "noisy" graphics-cluttered ID documents. AU10TIX's new DL-OCR also enables superior content extraction for "complicated" language fonts such as Chinese and Japanese thus opening new opportunities for KYC regulated services in global markets.

(PRWeb March 22, 2017)

Read the full story at http://www.prweb.com/releases/2017/03/prweb14169543.htm


          AU10TIX Reports +65% Growth in Client Base and X3 in Traffic Volume in 2016        

AU10TIX reports record results in 2016 highlighted by +65% growth in number clients and partners; and tripling of the processed volume. 2016 also saw further strengthening of AU10TIX's technology leadership with new capabilities such as Deep-Learning enhanced Selfie-to-ID face matching, new mobile SDK and broader global coverage.

(PRWeb January 09, 2017)

Read the full story at http://www.prweb.com/releases/2017/01/prweb13954419.htm


          AU10TIX Releases Deep-Learning Based ID-to-Selfie Face Comparison Beta        

Advanced Deep Learning algorithm enables AU10TIX face comparison service to compare quicker and more accurately face photos retrieved from ID documents with selfie pictures captured by device cameras. AU10TIX Deep Learning technology helps overcome the problematic of conventional Face Comparison in handling image variations and image quality issues.

(PRWeb September 19, 2016)

Read the full story at http://www.prweb.com/releases/2016/09/prweb13693691.htm


          Facebook's giving away servers for AI: So what does it get in return?        
In a drive to stimulate AI research, Facebook's donation of GPU-accelerated servers to institutes across Europe is also its way of targeting deep-learning talent.
          Deeplearning.ai: Announcing New Deep Learning Courses on Coursera        

Article URL: https://medium.com/@andrewng/deeplearning-ai-announcing-new-deep-learning-courses-on-coursera-43af0a368116

Comments URL: https://news.ycombinator.com/item?id=14958779

Points: 322

# Comments: 92





          Yapay Zekanın Deep-Learning Ä°le Düşe Kalka Ä°mtihanı        

Çoğu hayvan için, yürüme içgüdüseldir. Ä°nsanlar için ve robotlar için biraz öğrenme gerektiren bir durumdur. Ancak, bir yazılım robotu, derin bir öğrenme yardımı ile, küçük bir uygulamanın ardından yürümeyi öğrendi ve bir gün hayatımızda olan robotlar da aynı taktikler ile yürüyebilir ve koşabilirler.   British Columbia Üniversitesi’nden Xue Bin Peng, Glen Berseth ve Michiel van …

The post Yapay Zekanın Deep-Learning İle Düşe Kalka İmtihanı appeared first on roboturka.com |.


          5 TECHNOLOGIES THAT COULD SHAPE THE FUTURE        

wps83EC.tmp
From flying warehouses to robot toilets - five technologies that could shape the future
By Leandro L. Minku, Nervo Xavier Verdezoto D. & Stephan Reiff-Marganiec,
The Conversation, 27 July 2017.

Flying warehouses, robot receptionists, smart toilets…do such innovations sound like science fiction or part of a possible reality? Technology has been evolving at such a rapid pace that, in the near future, our world may well resemble that portrayed in futuristic movies, such as Blade Runner, with intelligent robots and technologies all around us.


But what technologies will actually make a difference? Based on recent advancements and current trends, here are five innovations that really could shape the future.

1. Smart homes

wpsC409.tmp
Credit: Pixaline/Pixabay

Many typical household items can already connect to the internet and provide data. But much smart home technology isn’t currently that smart. A smart meter just lets people see how energy is being used, while a smart TV simply combines television with internet access. Similarly, smart lighting, remote door locks or smart heating controls allow for programming via a mobile device, simply moving the point of control from a wall panel to the palm of your hand.

But technology is rapidly moving towards a point where it can use the data and connectivity to act on the user’s behalf. To really make a difference, technology needs to fade more into the background - imagine a washing machine that recognises what clothes you have put into it, for example, and automatically selects the right programme, or even warns you that you have put in items that you don’t want to wash together. Here it is important to better understand people’s everyday activities, motivations and interactions with smart objects to avoid them becoming uninvited guests at home.

Such technologies could even work for the benefit of all. The BBC reports, for example, that energy providers will “reduce costs for someone who allows their washing machine to be turned on by the internet to maximise use of cheap solar power on a sunny afternoon” or “to have their freezers switched off for a few minutes to smooth demand at peak times.”

A major concern in this area is security. Internet-connected devices can and are being hacked - just recall the recent ransomware attack. Our home is, after all, the place where we should feel most secure. For them to become widespread, these technologies will have to keep it that way.

2. Virtual secretaries

wps8C2D.tmp
Credit: ibmphoto24/Flickr, CC BY-NC-ND 2.0.

While secretaries play a very crucial role in businesses, they often spend large parts of their working day with time-consuming but relatively trivial tasks that could be automated. Consider the organisation of a “simple” meeting - you have to find the right people to take part (likely across business boundaries) and then identify when they are all available. It’s no mean feat.

Tools such as doodle.com, which compare people’s availability to find the best meeting time, can help. But they ultimately rely on those involved actively participating. They also only become useful once the right people have already been identified.

By using context information (charts of organisations, location awareness from mobile devices and calendars), identifying the right people and the right time for a given event became a technical optimisation problem that was explored by the EU-funded inContext project a decade ago. At that stage, technology for gathering context information was far less advanced - smart phones were still an oddity and data mining and processing was not where it is today. Over the coming years, however, we could see machines doing far more of the day-to-day planning in businesses.

Indeed, the role of virtual assistants may go well beyond scheduling meetings and organising people’s diaries - they may help project managers to assemble the right team and allocate them to the right tasks, so that every job is conducted efficiently.

On the downside, much of the required context information is relatively privacy-invasive - but then the younger generation is already happily sharing their every minute on Twitter and Snapchat and such concerns may become less significant over time. And where should we draw the line? Do we fully embrace the “rise of the machines” and automate as much as possible, or retain real people in their daily roles and only use robots to perform the really trivial tasks that no one wants to do? This question will need to be answered - and soon.

3. AI doctors

wpsE589.tmp
Credit: Brother UK/Flickr, CC BY 2.0.

We are living in exciting times, with advancements in medicine and AI technology shaping the future of healthcare delivery around the world.

But how would you feel about receiving a diagnosis from an artificial intelligence? A private company called Babylon Health is already running a trial with five London boroughs which encourages consultations with a chatbot for non-emergency calls. The artificial intelligence was trained using massive amounts of patient data in order to advise users to go to the emergency department of a hospital, visit a pharmacy or stay at home.

The company claims that it will soon be able to develop a system that could potentially outperform doctors and nurses in making diagnoses. In countries where there is a shortage of medical staff, this could significantly improve health provision, enabling doctors to concentrate on providing treatment rather than spending too much time on making a diagnosis. This could significantly redefine their clinical role and work practices.

Elsewhere, IBM Watson, the CloudMedx platform and Deep Genomics technology can provide clinicians with insights into patients’ data and existing treatments, help them to make more informed decisions, and assist in developing new treatments.

An increasing number of mobile apps and self-tracking technologies, such as Fitbit, Jawbone Up and Withings, can now facilitate the collection of patients’ behaviours, treatment status and activities. It is not hard to imagine that even our toilets will soon become smarter and be used to examine people’s urine and faeces, providing real-time risk assessment for certain diseases.

Nevertheless, to enable the widespread adoption of AI technology in healthcare, many legitimate concerns must be addressed. Already, usability, health literacy, privacy, security, content quality and trust issues have been reported with many of these applications.

There is also a lack of adherence to clinical guidelines, ethical concerns, and mismatched expectations regarding the collection, communication, use, and storage of patient’s data. In addition, the limitations of the technology need to be made clear in order to avoid misinterpretations that could potentially harm patients.

If AI systems can address these challenges and focus on understanding and enhancing existing care practices and the doctor-patient relationship, we can expect to see more and more successful stories of data-driven healthcare initiatives.

4. Care robots

wps4195.tmp
Credit: SoftBank Robotics

Will we have robots answering the door in homes? Possibly. At most people’s homes? Even if they are reasonably priced, probably not. What distinguishes successful smart technologies from unsuccessful ones is how useful they are. And how useful they are depends on the context. For most, it’s probably not that useful to have a robot answering the door. But imagine how helpful a robot receptionist could be in places where there is shortage of staff - in care homes for the elderly, for example.

Robots equipped with AI such as voice and face recognition could interact with visitors to check who they wish to visit and whether they are allowed access to the care home. After verifying that, robots with routing algorithms could guide the visitor towards the person they wish to visit. This could potentially enable staff to spend more quality time with the elderly, improving their standard of living.

The AI required still needs further advancement in order to operate in completely uncontrolled environments. But recent results are positive. Facebook‘s DeepFace software was able to match faces with 97.25% accuracy when tested on a standard database used by researchers to study the problem of unconstrained face recognition. The software is based on Deep Learning, an artificial neural network composed of millions of neuronal connections able to automatically acquire knowledge from data.

5. Flying warehouses and self-driving cars

wpsA532.tmp
Credit: JeffLupient/DeviantArt

Self-driving vehicles are arguably one of the most astonishing technologies currently being investigated. Despite the fact that they can make mistakes, they may actually be safer than human drivers. That is partly because they can use a multitude of sensors to gather data about the world, including 360-degree views around the car.

Moreover, they could potentially communicate with each other to avoid accidents and traffic jams. More than being an asset to the general public, self-driving cars are likely to become particularly useful for delivery companies, enabling them to save costs and make faster, more efficient deliveries.

Advances are still needed in order to enable the widespread use of such vehicles, not only to improve their ability to drive completely autonomously on busy roads, but also to ensure a proper legal framework is in place. Nevertheless, car manufacturers are engaging in a race against time to see who will be the first to provide a self-driving car to the masses. It is believed that the first fully autonomous car could become available as early as the next decade.

The advances in this area are unlikely to stop at self-driving cars or trucks. Amazon has recently filed a patent for flying warehouses which could visit places where the demand for certain products is expected to boom. The flying warehouses would then send out autonomous drones to make deliveries. It is unknown whether Amazon will really go ahead with developing such projects, but tests with autonomous drones are already successfully being carried out.

Thanks to technology, the future is here - we just need to think hard about how best to shape it.

Top image credit: NEC Corporation of America/Flickr, CC BY 2.0.

[Source: The Conversation. Images added.]


          Will Deep Learning Scale to Supercomputers?        

Are supercomputers practical for Deep Learning applications? Over at the Allinea Blog, Mark O'Connor writes that a recent experiment with machine learning optimization on the Archer supercomputer shows that relatively simple models run at sufficiently large scale can readily outperform more complex but less scalable models. "In the open science world, anyone running a HPC cluster can expect to see a surge in the number of people wanting to run deep learning workloads over the coming months."

The post Will Deep Learning Scale to Supercomputers? appeared first on insideHPC.


          High-speed light-based systems could replace supercomputers for certain ‘deep learning’ calculations        
A team of researchers at MIT and elsewhere has developed a new approach to deep learning systems — using light instead of electricity, which they say could vastly improve the speed and efficiency of certain deep-learning computations. Deep-learning systems are based on artificial neural networks that mimic the way the brain learns from an accumulation of [...]
          A deep-learning tool that lets you clone an artistic style onto a photo        
“Deep Photo Style Transfer” is a cool new artificial-intelligence image-editing software tool that lets you transfer a style from another (“reference”) photo onto your own photo, as shown in the above examples. An open-access arXiv paper by Cornell University computer scientists and Adobe collaborators explains that the tool can transpose the look of one photo [...]
          New Deep Learning Courses Released on Coursera, with Hope of Teaching Millions the Basics of Artificial Intelligence        

FYI: If you follow edtech, you know the name Andrew Ng. He's the Stanford computer science professor, who co-founded MOOC-provider Coursera and later became chief scientist at Baidu. Since leaving Baidu, he's been working on three artificial intelligence projects, the first of which he unveiled yesterday. On Medium, he wrote: I have been working on […]

New Deep Learning Courses Released on Coursera, with Hope of Teaching Millions the Basics of Artificial Intelligence is a post from: Open Culture. Follow us on Facebook, Twitter, and Google Plus, or get our Daily Email. And don't miss our big collections of Free Online Courses, Free Online Movies, Free eBooks, Free Audio Books, Free Foreign Language Lessons, and MOOCs.


          What is Deep Learning & Marketing Technology? Definition, How It Works, Best Practices, and Benefits of Deep Learning & Marketing Technology        

Definition of Deep Learning & Marketing Technology Deep learning refers to the algorithm-based machine learning techniques that are used to process data. The inspiration for deep learning comes from the human brain which is comprised of neural networks. Deep learning technology uses multiple layers – just like the neural networks in our brain – between […]

The post What is Deep Learning & Marketing Technology? Definition, How It Works, Best Practices, and Benefits of Deep Learning & Marketing Technology appeared first on CallMiner.


          Nvidia Launches Pascal GPUs for Deep Learning Inferencing        

Already entrenched in the deep learning community for neural net training, Nvidia wants to secure its place as the go-to chipmaker for datacenter inferencing. At the GPU Technology Conference (GTC) in Beijing Tuesday, Nvidia CEO Jen-Hsun Huang unveiled the latest additions to the Tesla line, Pascal-based P4 and P40 GPU accelerators, as well as new software all aimed at improving performance for inferencing workloads that undergird applications like voice-activated assistants, spam filters, and recommendation engines.

The post Nvidia Launches Pascal GPUs for Deep Learning Inferencing appeared first on HPCwire.


          Google RankBrain: Content Truly Reigns Over the SEO Landscape        

Google-RankBrain

Welcome to the democratization of content and information.

RankBrain is the super smart three month-old brainchild of five Google engineers and a deep-learning expert. What’s come to fruition after an arduous year of research and development (along with Google’s five year-long movement toward A.I.), is Google’s recent incorporation of Artificial Intelligence into their latest 2013 algorithm update, Hummingbird.

RankBrain now handles 15% of search queries by organizing vast amounts of language into mathematical equations, called vectors, which the computer can “understand” and make connections with. This means Google can better understand a search query and tie it to relevant, quality results, regardless of how often it’s searched for or how new the search is.

Greg Corrado, a senior research scientist at Google, told the Washington Post that while RankBrain is just one of hundreds of signals that determine rank, it has quickly become the third most important, and by turning off this feature it “would be as damaging to users as forgetting to serve half the pages on Wikipedia”.

For a deeper learning experience, read a post by Kristine Schachinger and learn more about Hummingbird, entity search (how an algorithm recognizes combinations of and the order of words), and Google’s new A.I., RankBrain.

What This Means for Search Results

Google is now able to make connections across the meanings of search queries using RankBrain’s pattern recognition learning skills (rather than human work to connect synonyms with each other) so that when something is not searched for often, or is a new search, RankBrain can still make a great guess as to what content will appeal most to those searchers.

This is because RankBrain is a learning machine, which means as time goes on it will become better and better at identifying what content is best for any keyword based on pattern recognition across language used in the pages that have been indexed. This is how semantics will be determined by populations at large, rather than Google engineers and other Google-hired persons. Long live semantic search!

Here’s an example:

Before, using entity search (along with the human work that went into connecting synonyms), Google could connect the hypothetically uncommon search query “How many ENT doctors does it take to fix sinusitis” more closely to a page that says:

“FAQ: How many Otolaryngologists will I need to visit to clear my sinusitis?

Answer: It should only take one skilled Otolaryngologist to help you with your sinusitis and allergy problems.”

then to a page that says:

“..when I ask a patient how many years they’ve been experiencing sinusitis, and ask “how long does it take” for them to start noticing a change in symptoms after taking… as part of a team of experienced ENT doctors…”

Now, Google can connect “How many ENT doctors does it take to fix sinusitis” with other related language based on the patterns it’s identified across indexed pages and therefore connect this search with results that fit those same language patterns. So perhaps the page that would rank would say:

“…each of our highly skilled Otolaryngologists are certified by the American Board of Otolaryngology (ABO) and will be able to handle any sinusitis problems you may be experiencing. This includes…”

Google RankBrain Example

What This Means for Investing in SEO

So, now that Google’s so clever, is it all over for SEO? Heck no! If you know how to gain insight into your audience and their needs using keyword research and can translate this into content that they’ll love – you’ll excel as a top competitor in the search results; it’s a good thing that as Google’s A.I. learns, it will continuously give us better information in order to do just that.

Here’s a little glimpse of TopRank Marketing’s approach to integrating SEO and Content Marketing in a way that keeps us competitive in a quickly-changing SEO environment.

Header image via Shutterstock

The post Google RankBrain: Content Truly Reigns Over the SEO Landscape appeared first on Newsroom.


          Machine and Deep Learning in Python: What You Need to Know        

Big Data. Deep Learning. Data Science. Artificial Intelligence. It seems like a day doesn't go by when we're not bombarded with these buzzwords. But what's with all the hype? And how can you use it in your own business? What is Machine Learning? At its simplest level, machine learning is simply the process of optimizing [...]

The post Machine and Deep Learning in Python: What You Need to Know appeared first on DevelopIntelligence Blog.


          Intel Democratizes Deep Learning Application Development with Launch of Movidius Neural Compute Stick        

On July 20, 2017, Intel launched the Movidius™ Neural Compute Stick, the world’s first USB-based deep learning inference kit and self-contained artificial intelligence (AI) accelerator that delivers dedicated deep neural network processing capabilities to a wide range of host devices at the edge. Designed for product developers, researchers and makers, the Movidius Neural Compute Stick aims to … Continued

The post Intel Democratizes Deep Learning Application Development with Launch of Movidius Neural Compute Stick appeared first on Intel Newsroom | Deutschland und Österreich.


          New Deep Learning Courses Released on Coursera, with Hope of Teaching Millions the Basics of Artificial Intelligence        

FYI: If you follow edtech, you know the name Andrew Ng. He's the Stanford computer science professor, who co-founded MOOC-provider Coursera and later became chief scientist at Baidu. Since leaving Baidu, he's been working on three artificial intelligence projects, the first of which he unveiled yesterday. On Medium, he wrote: I have been working on […]

New Deep Learning Courses Released on Coursera, with Hope of Teaching Millions the Basics of Artificial Intelligence is a post from: Open Culture. Follow us on Facebook, Twitter, and Google Plus, or get our Daily Email. And don't miss our big collections of Free Online Courses, Free Online Movies, Free eBooks, Free Audio Books, Free Foreign Language Lessons, and MOOCs.


          Design Patterns for Deep Learning Architectures        
Share the article!

Share the article!Deep Learning can be described as a new machine learning toolkit that has a high likelihood to lead to more advanced forms of artificial intelligence. The evidence for this is in the sheer number of breakthroughs that had occurred since the beginning of this decade. There is a new found optimism in the […]
Share the article!

          è©±é¡Œã«ãªã‚ŠãŒã¡ãªäººå·¥çŸ¥èƒ½ã®è«–点を改めて整理してみる        

今月号のニュートンの「人工知能が人類を超える日」を読みながら、人工知能トピックの論点を整理する必要があるな、と思った。

というのも、シンギュラリティという言葉を引き合いに出して、2045年までにAIが人類を超えるというのが、昨今の人工知能特集のメイン。だいたいそこに、Deep Learningと人間の仕事をAIが奪うというトピックを併せて、それっぽい現在の事例を並べるという構造を取っていることが多い。ニュートンもそうだし、最近みた人工知能系の特集はこのパターンが多い。ニュートンがちょっとおもしろかったのは、「人工頭脳プロジェクト〜ロボットは東大にはいれるか〜」の紹介で、センター試験物理の問題をコンピュータが理解するための技術的課題を整理している。これは、結構おもしろいトピックだと思うけど、まぁ、ネタとしては若干マイナーな感じがする。

人工知能の論点を整理する上で以下の本は非常に都合が良い。

特に良いとおもったのが、その章立てなので、抜粋してみる。

  • 何を持って「人工知能」とみなす?
  • 人工知能の怖さは予測精度にある
  • 人工知能の得意・不得意がみえてきた
  • クリエイティブなコンピュータは出現するか
  • 核心は「近似」の判断にあり
  • 「予測」の仕組みが見えてきた
  • ディープラーニングってそんなにすごいの?
  • 株価予測で抜け駆けはできる?
  • ビッグデータが新しい国家を作る
  • ビッグデータは誰のもの?
  • 脳をそのままコピーできるのか
  • ネット上に自己の「意識」は放てる?
  • 「飽きる」は機械にはない特性
  • 人はイルカの「知」をもっと引き出せる?
  • 人工知能は「人」の領域を広げるか?
  • すべての犯罪を見逃さない社会がやってくる
  • ビッグデータと人工知能が不平等をあぶり出す
  • 自動運転は良い事ばかりじゃない!?
  • 命を秤に掛ける判断が問われる
  • 人間と共存する機械は感情を持つべき?
  • 経済活動がリアルタイムで変化しはじめる
  • ロボットには寿命が必要
  • コンピューターに、愛は伝わるか?
  • 人間は何をどう学べばいいの?
  • 勉強できる子、できない子はどうやってフォローする?
  • 教育は本当にフラット化されるか
  • 人工知能は「いまどきの若者」を底上げできる?
  • 医師やコンサルタントも格付けされる?
  • 人工知能はウェアラブルから人体直結へ?
  • 人間の尊厳はどこで保たれるか?

これは、現時点で書籍でカバーすることができるトピックを広くカバーできている。そういう意味では、今人工知能の論点を整理する上での最初の本として非常に良い。もちろん、最初の一歩だが、実際、こういう全体の論点自体を理解することが、個別のトピックの理解につながるので、まだ未読の人はおすすめです。

さて、こうやって眺めてみると、結構いろんなものが見えてくる。

まず、ディープラーニングという手段の重要性。基本的に最近の人工知能ネタの半分くらいは背景にこの手法とこの手法のための計算手法の発展があると思って良い。slideshareにいい感じの資料がたくさんあがっているので、興味がある人は見ると良いと思う。

Deep learningの軽い紹介 from Yoshihisa Maruya

そして、ビッグデータ・医療・教育などの現実的問題との関連性。このあたりはすでにビジネスの領域になっているので、簡単にキャッチアップできる。

最後に人工知能と人間の仕事との関係性。クリエイティブとの関連・仕事を奪う・何ができるのか当たりが鉄板のトピックで、この辺りを突き詰めると俗に2045年問題と言われる人工知能が危険というトピックがある。この人工知能が危険というのは、結構おもしろいトピックで、いろいろな人の意見を自分なりにまとめると、生物がDNAを維持するシステムという考え方とは別にミームという社会や文化を維持する情報を維持するシステムでもあるという意見がリチャード・ドーキンスの「利己的な遺伝子」で提唱された。「利己的な遺伝子」とその関連作の「盲目の時計職人」はすごいおもしろい本なので、興味ある人は読んでいただくとして、このミームという概念で考えた場合、人工知能というシステムは人間と比較すると「情報の保存・発展」という部分で非常に優れているので、将来的にこの役割において、人工知能は人間に取って代わるのではないか、という指摘なのだと思う。改めて書いてみるとずいぶん形而上学的な指摘だ。

この指摘は、じつはカー先生の最新作「オートメーション・バカ」で、「ルーチンワークとクリエイティビティの関係性」という形で議論されており、そこと併せて読むと、クリエイティブとの関係性や自動化と人間の関係という人工知能トピックになるというなかなかおもしろい構造になっている。このあたりは、まだきちんとまとめた記事を読んだことないので、一度僕もまとめるのチャレンジしてみたい。

そんなわけで、上記のように人工知能の論点って整理できるのかなーと、ニュートンや上の本を読みながら思った。


          AI pioneer Andrew Ng’s online courses let anyone learn about deep learning        

Deep learning is one of today’s hottest trends in the technology industry, but it is a complicated subject for anyone who is not a data scientist or an engineer. Coursera Inc. co-founder Andrew Ng, who is one of the leading pioneers in the ongoing artificial intelligence revolution, wants to make it easier for more people to […]

The post AI pioneer Andrew Ng’s online courses let anyone learn about deep learning appeared first on SiliconANGLE.


          What is Deep Learning ??        
This article basically just a translation version of the article of deep learning previously in bahasa Indonesia. This time I want to share about what is deep learning, at least as far as I have learned untul this day (when this…
          GPU-based Deep Learning Enhances Drug Discovery Says Startup        

Sifting the avalanche of life sciences (LS) data for insight is an interesting and important challenge. Many approaches are used with varying success. Recently, improved hardware – primarily GPU-based – and better neural networking schemes are bringing deep learning to the fore. Two recent papers report the use of deep neural networks is superior to typical machine learning (support vector machine model) in sieving LS data for drug discovery and personalized medicine purposes.

The post GPU-based Deep Learning Enhances Drug Discovery Says Startup appeared first on HPCwire.


          Nvidia zeigt Tesla V100 (GTC 2017)        
Nvidia stellte auf der GTC 2017 unter anderem die V100 vor, eine für Deep-Learning gedachte Beschleunigerkarte mit Volta-GPU
          Nvidia GTC 2017 Keynote        
Nvidia stellte auf der GTC 2017 unter anderem die V100 vor, eine für Deep-Learning gedachte Beschleunigerkarte mit Volta-GPU.
          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        
 There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tail of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that rely on accurate transcription for their operations. Read More
          Cloudworld: A Hegelian Theory of Complexity and Algorithmic Reality        
Philosophy could be an important conceptual resource in the determination of human-technology interactions for several reasons. First, philosophy concerns the topics of world, reality, self, society, aspirations, and meaning, all of which we are hoping to reconfigure and accentuate in our relations with technology. Improving human lives is after all one of the main purposes of technology. Second, philosophy relates to thinking, logic, reasoning, and being, which are the key properties of what we would like our technology entities to do. We would like our technology entities to be more like persons: pre-uncanny valley but fully-fledged tech others; thinker helpers, empathic listeners, coaches, optimizers; a new kind of technology-presenced companion. However, ensconced in recent computational advances, it has been neglected to look to thinking about thinking as a primary resource. Third, philosophy treats the grasping and naming of new things in the world, which is precisely helpful in the case of new and quickly-emerging technological realities.

Hegel could be a potentially helpful position in the consideration of the governance of emerging technologies. This is because the Hegelian reference point is specifically a moving dialogical expanding and not a pre-specified moment in response to unfolding situations. The Hegelian method involves triads: there is the thing itself, its negation, and a bigger third position that sublates the truth content out of the two previous positions into a new shape of its own consciousness. This kind of conceptual robustness could help in articulating more nuanced positions regarding emerging technologies and moving beyond stark binaries like ‘adopt-or-don’t adopt,’ technological dualism that ‘any technology has both good and evil uses,’ and a seemingly inevitable hopelessness in the face of existential risk.

The current situation of emerging technology is one of algorithmic reality. Not only are more new kinds of technology entities having a substantial presence in our human reality, where we are interacting with them on a regular basis, there is a sense of a quickening progression of these entities. There are drones, self-driving cars, personal home robots, quantified-self gadgets, Siri-commanded mobile phones, blockchain smart contract DACs, tradenets, deep-learning algorithms, big data clouds, brain-computer interfaces, neural hacking devices, augmented reality headsets, and deep-learning gaming worlds. Further, each of these technology classes is itself a platform, network, and app store, where the implication is cloudworld. Cloudworld is the notion of a deep multiplicity of networks as a compositional element of new algorithmic realities, where every network is a Turing-complete general computational substrate for every other. Any technology can immediately ‘grok,’ simulate, and run any other; the meaning of which from our human standpoint is vastly unclear. Derivatively, any sort of cloudmind (clustered interactions between multiple human minds or entities (e.g.; artificial intelligence) coordinated via the Internet cloud) might run on any platform.

A Hegelian theory of algorithmic reality is a complexity philosophy position, meaning that it has the properties of a complex adaptive system in being nonlinear, emergent, dynamic, open, unknowable, self-organizing, and interdependent. A complexity philosophy position is required to congruently correspond to the underlying reality which is itself complex. Algorithmic reality is not just an increasing degree of human-technology entity interaction but a multiplicity and proliferation of classes of network technology entities. The Hegelian position is exactly one that might constitute a bigger yes-and collaboration space that expansively accommodates all parties.

Inspiration: Minsky's legacy in the context of contemporary and near-future AI

          Machine Trust Language (MTL): Human-Machine Collaboration        
Andreas Antonopoulos’s articulation of network-enforced trust primitives (Oct 2015, Feb 2014) could be extended more broadly into the concept of Machine Trust Language (MTL). While blockchains are being popularly conceived as trust machines, and as a new mode of creating societal shared trust, Andreas addresses how at the compositional level, this trust is being generated. The key idea is thinking in terms of a language of trust, of its primitives, its quanta, its elemental pieces, its phonemes, words, and grammar that can be assembled into a computational trust system.

Blockchains are a network-centric trust system that can make and enforce promises. A network is not just a decentralized architecture; a network can have functional properties built into it. Network-centric or network-enforced functionality can thus enable a more complex level of activity. As XML standardized, facilitated, and undergirded Internet I: the Internet of information transfer, MTL could similarly for the Internet II: the Internet of value transfer.

Trust Primitives: Technical Details
The atomistic building blocks of trust, trust primitives, arise from blockchain scripting languages; they are the programming functions or opcodes used to specify different situations. Some examples are OP_CHECKSIG (a script opcode used to verify that a signature is valid) and OP_CHECKLOCKTIMEVERIFY (a script opcode used for a transaction output to be made unspendable until some point in the future).

As human language components are aggregated into different levels (phonemes, morphemes, lexemes, syntax, and context), so too can blockchain trust primitives. These indivisible blockchain trust particles, trust quanta, can be assembled into larger trust structures like payments. One example could be a micropayment channel with bidirectional settlement for vendor payment, for example entered in 1000 blocktime confirmations for 10 millibits. There could be libraries of standard trust primitives that are always included, for example, to verify the signature or multi-signature status of any transaction. The possibility of fine-grained trust primitives is limitless – a very small instruction set can be used as a toolkit for innovation that is composed into infinitely complex macro expressions. Some other examples Andreas mentions in addition to payment channels are stealth addresses, payment codes, and multisig escrows.

More sophisticated examples of in-built blockchain trust are already starting to become conceptual standards. One is Lighthouse, a cryptowallet that has crowdfunding (the ability to pledge funds to an address) as an incorporated feature; essentially a decentralized network Kickstarter program. The Kickstarter functionality is in the program (there is no custodian); just as Bitcoin allows digital currency transfers without a central bank, so too the Lighthouse wallet coordinates crowdfunding for projects without a central intermediary like Kickstarter. A whole series of similar network primitives with embedded trust functionality can be envisioned. These could include crowdfunding, reputation-checking, backfeeding (emergent collaboration), insurance, multisig, payment channels, peer-to-peer tipping (ProTip), compensation, remuneration, micropayments, IP tracking, backup (specified blockchain transaction record-keeping and archival), and advocacy (via third-party oracle like Smart Contract and Early Temple).

Trust as a Feature: Human-Machine Social Contracting
When trust becomes a ‘mere’ assumed included feature as opposed to a marveled at and explicitly designed functionality, we will have really arrived somewhere as a species. In some sense, the entire apparatus and infrastructure known as society has been produced to instill and manage trust. Deception had an evolutionary benefit, but is perhaps a quality that can be reconfigured, first in machine-mediated human interaction, and later in human biology. The longer-term endgame of blockchains-as-algorithmic-trust is human-machine collaboration, particularly in the application of shifting from the labor economy to the actualization economy. Given the increasing potential prevalence of machines in human existence, a looming topic is the kinds of social contracts that may be appropriate to establish between machines and humans. For example, consider what trust primitives might be needed to write a smart contract with your personalized home robot. To open a payment channel with your home robot, first could be identifying the relevant exchange streams for services and data. These might include personal data, life-logging, backup, diagnostics, advice, empathy, sound-boarding, home maintenance services, payments, and record-keeping; a list of operations that make sense to conduct in a ‘payment channel’ structure (e.g.; two-way open transfer over time of value between parties per triggering events).

A New Kind of Language
Here the concept would be considering the possibility space of all language and noticing that there could likely be a bigger range of language than has come into existence so far. There are human languages, computational languages, math, logic, and other systems of semantics and signifying. As seen with examples like math (Husserl), computing algorithms (Wolfram), intelligence (Yudkowsky), and self-assembled locomotion (Lipson) and life forms, what has been seen through the human example may be but a few nodes in a larger possibility space. The bigger query would be what new kinds of language can be made with blockchain trust primitives. Not just solving human problems (e.g.; creating automated trust structures) but creating new languages from these new functionalities. One next step could be applying linguistic theory (Chomsky, etc.), concept theory (Lakoff, Kant, etc.), and mathematics, logic, computation, complexity math, machine-learning, and deep-learning theory to creating platforms for the emergence of new kinds of language. The first task might be to optimize for obvious new types of trust language that might be possible and that might solve low-hanging fruit problems like offloading the cognitive and behavioral energy effort of deception to move to Brin’s Transparent Society. Blockchain trust could be for society what the quantified self fourth-person perspective was for the individual (a trustable independent objective arbitrator of information about reality).

Philosophy: A New Kind of Qualitative Language
A language of trust is undeniably qualitative. Trust is exactly the qualitative easing necessary for society to function, including in more intensive human-machine collaborations, and in larger scale universally-global and extraterrestrial singularity-class endeavors. Is it possible to reach a place with computational language to say what cannot be said with human language? Perhaps not in traditional 1s/0s computational language, but with a new kind of language of qualitative trust primitives, maybe yes. Wittgenstein famously said (the type of) all there is that can be said in the Tractatus, and in this crystallization pointed to what cannot be said, in three domains, ethics, aesthetics, and religion. Now thinking in terms of trust primitives and other qualitative primitives changes the question of what kinds of sentences and language can be written; the grammar and Wittgensteinian language games that can be enacted with blockchains; in an AI DAC and other applications. There could be many diverse blockchain cliometrics implementations in MTL; e.g.; the measurement of social qualitative factors like the amount of liberty in a political system. The notion is qualitative primitives and qualitative machine language; having a pourable bag of trust elements as components. There are trust primitives, and possibly many other kinds of qualitative primitives, for example freedom, autonomy, and choice primitives; idea primitives and innovation primitives; all of these could be on tap in a multi-faceted qualitative machine language to configure a life of crypto enlightenment

          VR Chains and DAC Brains: Upload your mind as a VR AI DAC        
Blockchain thinkers or DAC Brains are the notion of having DAO/DAC entities running with smart contracts on blockchains for the purpose of conducting thinking operations. The genesis of blockchain thinkers could be organic or inorganic: human mindfile lifelogs and uploads, and any variety of brain emulations and AI ML/DL algorithms (artificial intelligence machine-learning deep-learning algorithms). One idea is to instantiate your mindfile on the blockchain as a lifelogging tracker and standalone ideation tool: your own mind as an AI DAC. Some key enablers are coming together to make personal AI DACs possible. Idea chains (lifelogging your ideas onto a blockchain) could auto-record your ideas through 1) QS (quantified self)-attached gamma wave spike tracking (recording when you are having an idea), together with 2) cortical image recognition and thought identification (what the idea is about), logged into in a 3) personalized blockchain-based VR consensus reality (coordinating ideas into your own ongoing reality view).

Immersive Virtual Reality is Digitized Experience
Immersive VR (virtual reality), like with the Oculus Rift, is not just video games, virtual worlds, or 3-D CAVE environments, it is digitized experience. Qualitatively different, immersive virtual reality is a means of making physical world experiences real in an alternative medium. VR metaverses then, are parallel realities, as distinct from multiple digital worlds. If you and I go into WoW (World of Warcraft) or SL (Second Life) separately, we see the same world. Even if different levels of views are enabled or locked (like Karl Schroeder’s tech locks in Lady of Mazes [1]), they are just different lenses on the same world. However, if you and I construct our own digital worlds, we see and create different worlds, possibly on the same basic platform, but the realities can be fundamentally different, with different participants, events, and historical records.

Reality Unity in the Physical World
Consider the physical world - there is one platform, and we each have varying reality maps or views of the physical reality platform in our heads. There is one consensus reality and historical event record, and conflicts arise out of different views of the consensus reality trying to hew to one (e.g.; “What happened? X punched Y first. No, Y shoved X first.” – we seek a unique consensus reality of events (Probability Moon further explores the notion of societal shared reality)). Centralized virtual worlds have been the same; there is one reality platform, and centralized event engines record the consensus in one shared events ledger, the game history, even in OpenGL self-hosted models. Now, however, with decentralized models powered by blockchains and dapps, DAOs, and DACs, reality multiplicity is possible. There can be simultaneously existing parallel realities. The multiverse exists, and one place it can be created is in cyberspace.

Blockchains enable Simultaneous Multiple Realities
Just as blockchains are the critical enabling technology for digital cryptocurrencies, so too are they a key facilitator of VR multiverses. Blockchains could serve as the backbone infrastructure for multiple parallel realities (VR multiverses) by coordinating the chain of event histories in these multiple realities. The transaction history is not just for transactions, but more broadly comprises the historical event record. Blockchains consensus-generate the historical record, and allow any number of separate and parallel historical records to be created simultaneously. Blockchains are the mechanism for creating and coordinating simultaneous multiple realities. The altcoin space is already an example of simultaneous separate realities.

The Selectability of all Reality FeaturesBlockchains consensus-generate the historical record, and further, make it clear that all parameters of reality can be malleable and selectable: time, participation, reputation, memory, history (historicity), economic models (hierarchical or peer-based), and political operations (governance and decision-making). These are all selectable parameters of a reality environment. One recent revolution in economic liberation sensibility is that blockchains allow individuals and communities to self-determine economic systems. Now seen in the VR multiverse context, blockchains are revealed to be much more: they could enable all parameters of a reality environment to be selected.

Blocktime Time Malleability
One example of reality feature selectability is blocktime. The timeclock in blockchains is blocktime, the time it takes for blocks of transactions to confirm. The easiest way to specify future time moments (t+n) is via the internal time system of the blockchain, blocktime. For example, the term of a certain decentralized dapp loan might be 7000 block confirmations. Blocktime is the clocktime of blockchains. Certainly blocktime converts to physical world time, but differentials could arise and give way to arbitrage opportunities or other divergence-as-a-feature possibilities. The key point is that all reality parameters, including time and space, could become malleable in blockchains and especially in blockchain-coordinated VR metaverses. Further, if blockchains become the mechanism for keeping time and event histories, de facto they become memory, where memory is a critical functionality that feeds back directly into lifelogging and Brain-as-a-DAC idea chains.

A World of Multiple Realities
All of reality can be made malleable, personalized, self-determined, personally-constructed, emergent, and a thing of multiplicity not monolithicity. There can be an end to the tyranny of a sole reality. “End reality tyranny, create your own VR multiverse!” Deleuze's multiple inner views can bloom as described in Proust and Signs. In the new sensibility of VR multiverse reality multiplicity, an imaginable query to alien intelligence could be a Kardashev scale parameter: “To what extent do you have multiple realities in your world?”

Right to Self-Determine One’s Own Realities
The earlier positions in human liberation have been the right to self-determination in certain contexts, in different parts of life and the experience of reality. These include the right to self-determination in governance, legal systems, IP protection/sharing regimes, software business models, neural data privacy rights, cognitive enhancement, and most recently, the emerging sensibility of the right to self-determine one’s own economic systems. These are all important steps in the liberty of the individual, but they are all in some sense intermediary positions on the way to the now-visible bigger position which is the right to self-determine one’s own overall reality, and really, realities (plural). A new sensibility could be seeing the right of each individual, entity (human and machine/technology entities), or group to self-define its own personal consensus reality (realities). The central component of the self-determination of organisms could be the operation of its own consensus reality(ies).

Blockchains as a Historicity Mechanism and Collaboration Space 
Blockchains are a means for consensus-generating the historical record (a historicity mechanism) to facilitate reality multiplicity, and they are the means of enabling value flow. In network economic theory, this is beyond the transactional sense of the value flow of currency from me to you, where unleashing the creation and transmission of many kinds of non-monetary value flows is the bigger picture of what is at stake and possible in creating multiple realities. Non-monetary currencies (like universal human needs for connection, contribution, mattering, and understanding) can be registered and tracked as blockchain-based smart assets. One reason for VR realities, what we are really wanting in creating new realities (via VR multiverses) is creating spaces that are free of the limiting constraints of physical realities. These constraints pertain to both the physical world and human limitations, including matter, gravity, time, illness, disability, impairment, sleep, recovery, distraction, cognitive bias, etc.) such that more freedom, exploration, collaboration, expression, creativity, fun, serendipity, progress, and contribution can be enabled. We want to cognitively enhance proximately for a better memory, sure, but ultimately to be 'bigger' in the sense of being more able to grow and participate beyond our initial position of self. We want more of the creative yes-and collaboration space of new energy and idea generation. The ‘economy’ of the future might be measured based on non-monetary value flows like ideation, which could be orchestrated by public and private reality blockchains.

Convergent Integration of Multiple Simultaneous Realities
Now possibly having a situation of multiple simultaneous realities, what is there to do with them? There are several implications for the future of privacy, sharing, and collaboration. For example, there is a question about when and how to cohere and merge VR DAC brain realities. Therefore, within realities, there might be sub-threads or other means of parsing and segmenting sub-realities.
Colored coin threads in your brain DAC could be the way to permission subreddit mind ledgers to cloudmind collaborations
Mindchains could be a means for how to safely mindshare or collaborate in a cloudmind, for example by permissioning your subreddit ledger for ideation related to certain areas as opposed to your full mindfile or meat-brain….“here, let me share everything with you I’ve thought about crowdsourced genomic studies,” or "here, join the mindslack channel for this community."

Blockchain apps could auto-merge shared realities in the way that topical queries are ambiently processed in the background now. There could be situations analogous to Hayek’s competitive currencies where reality views compete. There could be reality ecologies where repetitive threads across individual realities converge into shared group realities (the unobtrusively representative politics of the future). Right now this happens manually with the blunt tools of the physical world; we search for other individuals, groups, and institutions with our own shared values and reality view, and blockchain DACs might facilitate the automatic canvassing and convergence of all of this.

We might know that VR metaverses and the human-machine collaboration are really working when VR NPC DACs self-create in our realities per sensing our human needs (actualization, contribution, growth and learning, exploration, creation). Blockchain-based VR AI DACs could auto-sense and create whatever 'Tuscany houses' are needed to grow an entity (like a human or machine mind) in its progression. For example, in an ideas 'economy,' the most important inputs are anything which facilitates the development of ideas, and attending to this could be one purpose of a an NPC VR AI DAC in your personal VR metaverse, operating via smart contracts on your mindchain. Ideas are the demurrage-redistributable basic income of a blockchain thinker Brain DAC. Blockchain thinker Brain DACs then become another Ubiquitous Grid Resource, an important one, for idea generation, in the overall picture of the future Network Economies of Abundance.

Acknowledgement: This post was inspired by ideas from Maciej Olpinski regarding consensus in virtual reality worlds.

[1] POV HUDs are a mechanism to accommodate multiple levels of technology adoption within a society; e.g.; through my HUD, I see unimproved nature and birds tweeting; through your HUD, you see propositional nanotech 3-D printed finery self-mutating in utility fogs.

          Wrenching Efficiency Out of Custom Deep Learning Accelerators        

Custom accelerators for neural network training have garnered plenty of attention in the last couple of years, but without significant software footwork, many are still difficult to program and could leave efficiencies on the table. This can be addressed through various model optimizations, but as some argue, the efficiency and utilization gaps can also be addressed with a tailored compiler.

Eugenio Culurciello, an electrical engineer at Purdue University, argues that getting full computational efficiency out of custom deep learning accelerators is difficult. This prompted his team at Purdue to build an FPGA based accelerator that could be agnostic to CNN

Wrenching Efficiency Out of Custom Deep Learning Accelerators was written by Nicole Hemsoth at The Next Platform.


          Accelerating Deep Learning Insights With GPU-Based Systems        

Explosive data growth and a rising demand for real-time analytics are making high performance computing (HPC) technologies increasingly vital to success. Organizations across all industries are seeking the next generation of IT solutions to facilitate scientific research, enhance national security, ensure economic stability, and empower innovation to face the challenges of today and tomorrow.

HPC solutions are key to quickly answering some of the world’s most daunting questions. From Tesla’s self-driving car to quantum computing, artificial intelligence (AI) is enabling unparalleled compute capabilities and outmatching humans at many cognitive tasks. Deep learning, an advanced AI technique, is growing in popularity

Accelerating Deep Learning Insights With GPU-Based Systems was written by Timothy Prickett Morgan at The Next Platform.


          Microsoft acquires Maluuba, a deep learning startup in Montreal        
Microsoft has announced the acquisition of a Montreal-based startup named Maluuba. The acquisition seems to revolve around Maluuba’s natural language work, though the startup works on more than just that, ultimately focusing on the development of artificial intelligence capable of thinking and speaking like a human. Microsoft says that Maluuba’s vision is “exactly in line with ours.” Microsoft describes this … Continue reading
          Hadoop周刊—第 168 期        

 

Hadoop周刊 第 168 期

 

 

启明星辰平台和大数据整体组编译

 

 

2016年5月1日

 

Kafka峰会本周在旧金山召开,不容置疑本周期刊将有大量的Kafka内容。除此以外,还有大量关于Impala性能、Kudu、Druid方面的文章。在其他新闻部分,Apache Apex成为了Apache的顶级项目,Qubole开源了其StreamX项目。

 

技术新闻

本文快速浏览了如何在可能或不可能创建新数据分区的情况下操作Spark RDD。尤其`mapValues`和`filter`会保存分区而`map`却不会。

https://medium.com/@corentinanjuna/apache-spark-rdd-partitioning-preservation-2187a93bc33e

 


本文介绍了如何使用Conda构建独立的Python环境(例如pandas插件),以便做为Spark job的一部分装载到集群节点。经过这样的处理,就能在没有python原生包被安装在主操作系统上的情况下运行PySpark job。这种方案同样适用于SparkR。

http://quasiben.github.io/blog/2016/4/15/conda-spark/

 

Datadog博客有三篇监控Kafka的系列文章。第一篇详细概括了broker、producer、consumers、ZooKeeper的关键度量指标。第二篇介绍了怎样在JConsole和其他工具上通过JMX查看指标,第三篇介绍了Datadog集成方面的知识。

https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/

 

Salesforce撰文介绍了Kafka在他们组织内的成长史。最初,他们借助Kafka驱动了操作指标分析功能,渐渐地成为一个驱动众多系统的大平台。Salesforce运用Kafka在多个数据中心运行,并使用MirrorMaker在集群间复制和聚合数据。

https://medium.com/salesforce-engineering/expanding-visibility-with-apache-kafka-e305b12c4aba#.5k7j921o3

 

Metamarkets博客有一篇关于优化大规模分布式系统的有趣博文。Druid,他们的分布式数据仓库,最近增加了一种"先进先出"的查询模式,并在重型负载大集群间进行了测试。根据他们的假设,推测任何可能发生和收集到有趣的的指标。

https://metamarkets.com/2016/impact-on-query-speed-from-forced-processing-ordering-in-druid/

 

Google Cloud Big Data博客撰文介绍了BigQuery的内部存储格式,容器,以及其它使得存储数据更有效率的优化措施。

https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format

 

Apache Kudu(孵化中)博客概述了最近使用YCSB工具对系统性能分析和调优的结果。

http://getkudu.io/2016/04/26/ycsb.html

 

Impala 2.5无论是TPC基准测试还是其它方面均有显著的性能提升。提升项包括运行时过滤器,LLVM代码生成器对`SORT`和`DECIMAL`的支持,更快的metadata-only查询,等等。

http://blog.cloudera.com/blog/2016/04/apache-impala-incubating-in-cdh-5-7-4x-faster-for-bi-workloads-on-apache-hadoop/

 

本文介绍了,为支持高可用性,如何对Hive Metastore配置MariaDB的。

https://developer.ibm.com/hadoop/blog/2016/04/26/bigsql-ha-configure-ha-hive-metastore-db-using-mariadb10-1/

 

Altiscale博客撰文介绍了寻找NodeGroup相关bug的过程(跟进三月的文章)。如果你因没找到Hadoop(或其他分布式系统)的bug根结而气馁,不要叹气。本文告诉你这的确困难,甚至需要程序员在销售Hadoop服务的企业干活才能搞定。

https://www.altiscale.com/blog/part-1-2-investigation-analysis-and-resolution-of-nodegroup-performance-issues-on-bare-metal-hardware-clusters/

 

Netflix现在运行了超过4000个Kafka broker,横跨36个集群。在云中运行Kafka需要一些权衡,团队平衡了开销和数据丢失(日数据丢失小于0.01%)。本文分享了团队在AWS中运行Kafka的经验,主要是一些典型问题,部署策略(小集群、隔离的zookeeper集群),集群级容错,支持AWS availability zones,Kafka UI可视化等等。

http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html

 

Amazon大数据博客撰文介绍了如何从Amazon EMR加密数据存放在S3中。这种集成方式同时支持客户端和服务器端加密(借助于Amazon KMS)。

http://blogs.aws.amazon.com/bigdata/post/TxBQTAF 3X7VLEP/Process-Encrypted-Data-in-Amazon-EMR-with-Amazon-S3-and-AWS-KMS

 

TubeMogul介绍了他们大数据平台的历史,该平台每月支撑万亿次数据分析请求。该团队很早就运用Amazon EMR,导入了Storm实时处理技术,最终把大数据服务落在了Qubole上。

https://www.tubemogul.com/engineering/the-big-data-lifecycle-at-tubemogul/

 

Caffe,深度学习框架,与Spark进行了集成—CaffeOnSpark。MapR公司撰文介绍了如何在MapR YARN上运行,文章还包括了采用的性能优化手段。

https://www.mapr.com/blog/distributed-deep-learning-caffe-using-mapr-cluster

 

其他新闻

Apache Apex,大数据流式处理和批处理系统,现在成为了Apache软件基金会的顶级项目。Apex去年8月进入孵化器。

https://blogs.apache.org/foundation/entry/the_apache_ software_foundation_announces90

 

Heroku Kafka,是一个分支于Heroku的Kafka管理服务。最近接近发布beta版。

https://blog.heroku.com/archives/2016/4/26/announcing-heroku-kafka-early-access

 

MapR博客上的一篇文章强调为什么性别多样性是重要的,还提到了大数据论坛中的女性,本文旨在鼓励女性投身于这一领域。大数据论坛中的女性研讨会本周由MapR组织在圣何塞召开。

https://www.mapr.com/blog/case-women-big-data

 

产品发布

StreamX是一个来自Qubole的开源项目,它能从Kafka拷贝数据到Amazon S3这样的目标存储中。Qubole把StreamX作为一种管理服务提供。

http://www.qubole.com/blog/big-data/streamx/

 

SnappyData是一个为OLAP和OLTP查询流式数据的新平台(和公司)。SnappyData由Apache Spark和GemFire的内存存储技术驱动。

http://www.infoworld.com/article/3062022/sql/apache-spark-powers-live-sql-analytics-in-snappydata.html

http://www.snappydata.io/

 

Apache Geode(孵化中)发布了1.0.0-incubating.M2版本,它是一个分布式数据平台,瞄准高性能和低延迟。新版本提供了广域网下的点对点连接等新特性。

http://mail-archives.apache.org/mod_mbox/incubator-geode-dev/201604.mbox/%3CCAFh%2B7k2eiK2TMGK sLqrY9CZDjxjYwiuTQ4QGUVC2s3geyJYwnA% 40mail.gmail.com%3E

 

Apache Knox发布了0.9.0版,它是Hadoop的REST API网关。新版本为Ranger和Ambari提供了UI界面支持,以及一些其它的提升和bug修复。

http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCACRbFyjRF7zShb-NQ29d3FJ0hKZ57ts0Qfo31ffuNODpskwqPQ @mail.gmail.com%3E

 

活动

中国

无



Rosen 2016-05-07 23:37 发表评论

          Deep Learning: A Practitioner’s Approach        

eBook Details: Paperback: 536 pages Publisher: WOW! eBook; 1st edition (August 20, 2017) Language: English ISBN-10: 1491914254 ISBN-13: 978-1491914250 eBook Description: Deep Learning: A Practitioner’s Approach

The post Deep Learning: A Practitioner’s Approach appeared first on eBookee: Free eBooks Download.


          What’s Next For Deep Learning?        

There are a lot of things that are next for deep learning. Instead of thinking of moving forward in one direction, think of expanding outward in many directions: Better reinforcement learning/integration of deep learning and reinforcement learning. Reinforcement learning algorithms that can reliably learn how to control robots, etc. Better generative models. Algorithms that can […]

The post What’s Next For Deep Learning? appeared first on jKool.


          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        
 There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tail of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that rely on accurate transcription for their operations. Read More

          Gpu Benchmark: GTX 1080 vs. Titan X        

GTX 1080 vs Titan X

Per i videogamers professionisti le sigle GTX 1080 e Titan X rappresentano il top di gamma tra le schede video (GPU) attualmente presenti sul mercato e sono, quindi, la meta da raggiungere per ogni videogiocatore che si rispetti. Entrambe le schede video sono prodotte da Nvidia e sono tutto quello che si vorrebbe da una scheda video: alte performance e configurazione hardware elevata.

Caratteristiche Tecniche

Come dicevamo la configurazione hardware di queste schede video è di altissimo livello e, come vedremo più avanti, nonostante le differenze tra le due GPU non siano elevate, solo una di esse è risultata nettamente la più performante dopo tutti i test a cui è stata sottoposta.
Gtx 1080 Titan X
Core CUDA  2560 3584
Clock di Base  1.6 Ghz 1.42 Ghz
Clock Boost  1.73 Ghz 1.53 Ghz
Memoria  8 Gb DDR5X 12 GB DDR5X
Banda di Memoria  320 GB/s 480 GB/s
TDP  180 W 250 W
Processore  GP104 GP102
Transistor  7.2bn 12bn

Prezzo

I prezzi di queste schede video rispecchiano le performance che sono in grado di raggiungere e i prezzi attuali sono molto elevati. Si parla di circa 1200 Euro per la Titan X mentre la Gtx 1080 costa circa 800 Euro.

Offerte Amazon

Abbiamo selezionato per voi due offerte interessanti da Amazon per questi due gioiellini di tecnologia. Se siete dei fan dei videogame e vi piace giocare solo alle massime risoluzioni e con il massimo della fluidità, valutare l'acquisto di una di queste schede potrebbe rappresentare il regalo perfetto da mettere sotto l'albero questo Natale:
 

Benchmark

Arriviamo ora al succo di quest'articolo ossia quali sono i risultati che queste due schede hanno fatto registrare durante i vari Benchmark per GPU. Già da questa estate era iniziato il fermento legato ai risultati dei test in quanto fino ad allora gli unici dati conosciuti per la GTX 1080 e la Titan X erano solo i dati ottenuti in laboratorio e diffusi dalla stessa Nvidia. I numeri erano interessanti ma chi segue queste notizie sa che solitamente i risultati registrati si discostano da quelli dichiarati dal produttore pertanto non appena è stato possibile mettere le mani su queste due schede video Nvidia, diversi portali online di hardware le hanno messe a confronto con il classico test 3d Mark. I risultati del test 3D Mark effettuati sulle due schede video sono consultabili a questo url e di seguito vi riportiamo la classifica. 3D Mark: Titan X vs GTX 1080 Questo tipo di confronto va più che bene per determinare quale scheda video sarà più performante in ambito videoludico ma non tutti sanno che le schede video più performanti come la Titan X e la GTX 1080 hanno applicazione anche in alcune nuove discipline come il Machine Learning o, nello specifico, il Deep Learning. Caso ha voluto che provando a cercare sul web  qualche notizia a riguardo abbiamo scoperto che una società italiana che si occupa di queste discipline, la Add-For di Torino, ha recentemente pubblicato il proprio studio sulle performance di queste due schede video applicate ad algoritmi di Deep Learning e testate con diverse librerie quali TensorFlow, Caffè e Neon. I risultati ottenuti durante questi Deep Learning Benchmarks hanno dato risultati leggermente diversi, eleggendo come regina indiscussa delle schede video la Titan X. Sicuramente si tratta di due schede video con altissime capacità e in grado di fare la gioia di qualsiasi videogiocatore. Se oltre all'aspetto ludico vi interessano anche applicazioni più professionali come nel caso del Deep Learning, allora la Titan X è la scelta ottimale! Prossimamente la Add-for rilascerà nuovi benchmark di schede video quali la Nvidia Tesla K40 e K80 che saranno installate sui performanti sistemi HPC.
          æ·±åº¦å­¸ç¿’為什麼會這麼厲害?Yann LeCun 提供的三個觀點        

深度學習 (deep learning) 大概是這幾年機器學習領域在應用上大放異彩的主要關鍵之一,從影像、語音到各種資料型態的辨識和分類,藉由深度學習帶來的技術突破,讓電腦達到了接近人類,甚至超越人類的水平。

但是,為什麼深度學習這麼厲害? 目前雖然還沒有完整的解釋,但是 KDNugget 的一篇 "3 Thoughts on Why Deep Learning Works So Well",摘錄了 Yann LeCun (Convolutionl Neural Network çš„主要貢獻者之一) çš„一些對談,提供了一些可能的思考方向。

七月底,Yann LeCun åœ¨çŸ¥è­˜ç¶²ç«™ Quora 上舉辦了一場線上問答,其中一個問題是「我們什麼時候可以看到深度學習的理論和數學基礎?」(When will we see a theoretical background and mathematical foundation for deep learning?)。

Yann LeCun 的回覆裡有幾個洞見,非常有參考價值:

  • 在高維度的空間裡,不容易發生局部最佳值 (local minima) 的狀況
  • 多層次的數學結構,可以更扼要地描述複雜的函數
  • ConvNets 對「特定的資料型態」很有效


局部最佳值是在做「數學最佳化」時常常遇見的問題,而大多數機器學習演算法都包含這個步驟,因此也就繼承了同樣的難題,通常需要反覆的調整參數才能克服。深度學習的演算法包含了非常大量的變數,等於是在一個非常高維度的空間裡做最佳化,也因而提供了一個跳脫局部最佳值的途徑。

而 Convolutionl Neural Network 作為深度學習演算法的其中一個分支,在影像辨識,或者更廣義的說,「空間信號」(spatial signals) 上,有比其他型態資料更好的表現,某種程度上也反映出在描述特定資料型態時,有其較適合的數學模型的事實。關於這一點更詳細的解釋,可以在 "Invariant scattering convolution networks" 這篇論文裡找到。


同一個 Quora 問答集裡的另一個問題,聊到深度學習最近值得注意的新發展,Yann LeCun 提到 GAN (Generative Adversarial Networks) 的發展,這跟我過去做過一段時間的 co-evolutionary algorithm 其實是很相近的概念,不過這個話題應該要另外寫一篇了。





          ä¸€ç”²å­çš„等待:從類神經網路到深度學習        
「深度學習」(Deep Learning)大概是這兩年最熱門的技術話題之一,日前參加中研院舉辦的 Deep Learning Workshop,莊炳煌院士在他的專題演講裡提出一個令人深思的問題:目前廣受討論的「深度學習」,其實指的是「深層類神經網路」(deep neural network),而今天的深層神經網路,跟50多年前聯結論學者提出的類神經網路,在理論架構上並沒有太大的不同,那麼,究竟是什麼原因讓我們等了將近60年?

從歷史的觀點來看,與60年前相比,我們現在有的是「強大的計算能力」和「近乎無窮無盡的資料」,而最近幾年 deep learning 領域的發展,的確有很大的部份是建立在這些基礎之上的。最早的類神經網路理論,從「輸入」到「輸出」,其實並沒有限定中間要有多少「層」,實際上是計算上的技術限制,讓過去50多年類神經網路的應用一直受限在 1~3 層;直到最近10年,才有新的計算架構出現,讓類神經網路從小於5層進展到10層、20層,直到2015年影像辨識大賽冠軍的200多層。

所以,我們等了60年的突破,就只是為了在技術上讓類神經網路能夠疊更多層嗎?

其實莊院士並不是唯一提出這個疑問的人。Vladimir Vapnik(統計學習領域的知名學者,SVM 的發明人之一),2015 年在柏林舉辦的研討會,Yandex School of Data Analysis Conference, Machine Learning: Prospects and Applications è£¡æå•åˆ°ï¼šã€Œæ·±åº¦å­¸ç¿’是否是來自魔鬼的禮物?」(Does Deep Learning Come from the Devil?)

Vapnik 認為「要怎麼區分一個點子是來自於上帝還是來自於魔鬼?上帝是聰明的,而魔鬼不是。」(God is clever, while the devil is not.)而現階段的 deep neural network,很大的程度還是仰賴「大量的計算」和「大量的資料」,這種屬於「暴力法」(brute force)的取向。所以,我們還有很多努力的空間。

這些宗師級學者對 deep learning 的評價,並不是對這個領域的攻擊,而是對一個更根本的問題的反思:「在這個發展過程中,人類的具體貢獻是什麼?」(到了宗師級別,思考的通常是最根本的問題:數學)

誠然,讓類神經網路的運算架構可以到達很多層,在工程上的確是個挑戰,但是這些進展是不是能讓我們對 statistical learning 本身有更深的理解?(也就是說,我們是不是更清楚的知道,怎樣的數學架構下,機器學習可以做得更好?前面提過,deep neural network 跟 1957 年提出的 perceptron 在數學架構上沒有太大的區別。)

莊院士覺得,所謂的「深度學習」不應該只侷限在 deep neural network 上,而應該進一步拓展到「流形」(manifold)的學習(manifold learning),讓機器學習除了能從資料做「預測」之外,還可以從資料做「解釋」。

當然,這是人工智能的終極目標之一,一旦達成,現今大部分的專家也都要失業了。所以,多等個幾年,好像也不算是壞事吧?




          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        
 There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tale of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that rely on accurate transcription for their operations. Jon Goldsmith, co-founder… Read More

          The Advanced Guide to Deep Learning and Artificial Intelligence Bundle for $42        
This High-Intensity 14.5 Hour Bundle Will Help You Help Computers Address Some of Humanity's Biggest Problems
Expires November 28, 2021 23:59 PST
Buy now and get 91% off

Deep Learning: Convolutional Neural Networks in Python


KEY FEATURES

In this course, intended to expand upon your knowledge of neural networks and deep learning, you'll harness these concepts for computer vision using convolutional neural networks. Going in-depth on the concept of convolution, you'll discover its wide range of applications, from generating image effects to modeling artificial organs.

  • Access 25 lectures & 3 hours of content 24/7
  • Explore the StreetView House Number (SVHN) dataset using convolutional neural networks (CNNs)
  • Build convolutional filters that can be applied to audio or imaging
  • Extend deep neural networks w/ just a few functions
  • Test CNNs written in both Theano & TensorFlow
Note: we strongly recommend taking The Deep Learning & Artificial Intelligence Introductory Bundle before this course.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, Numpy, and be able to write a feedforward neural network in Theano and TensorFlow.
  • All code for this course is available for download here, in the directory cnn_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Unsupervised Deep Learning in Python


KEY FEATURES

In this course, you'll dig deep into deep learning, discussing principal components analysis and a popular nonlinear dimensionality reduction technique known as t-distributed stochastic neighbor embedding (t-SNE). From there you'll learn about a special type of unsupervised neural network called the autoencoder, understanding how to link many together to get a better performance out of deep neural networks.

  • Access 30 lectures & 3 hours of content 24/7
  • Discuss restricted Boltzmann machines (RBMs) & how to pretrain supervised deep neural networks
  • Learn about Gibbs sampling
  • Use PCA & t-SNE on features learned by autoencoders & RBMs
  • Understand the most modern deep learning developments

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: intermediate, but you must have some knowledge of calculus, linear algebra, probability, Python, Numpy, and be able to write a feedforward neural network in Theano and TensorFlow.
  • All code for this course is available for download here, in the directory unsupervised_class2

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Deep Learning: Recurrent Neural Networks in Python


KEY FEATURES

A recurrent neural network is a class of artificial neural network where connections form a directed cycle, using their internal memory to process arbitrary sequences of inputs. This makes them capable of tasks like handwriting and speech recognition. In this course, you'll explore this extremely expressive facet of deep learning and get up to speed on this revolutionary new advance.

  • Access 32 lectures & 4 hours of content 24/7
  • Get introduced to the Simple Recurrent Unit, also known as the Elman unit
  • Extend the XOR problem as a parity problem
  • Explore language modeling
  • Learn Word2Vec to create word vectors or word embeddings
  • Look at the long short-term memory unit (LSTM), & gated recurrent unit (GRU)
  • Apply what you learn to practical problems like learning a language model from Wikipedia data

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, Numpy, and be able to write a feedforward neural network in Theano and TensorFlow.
  • All code for this course is available for download here, in the directory rnn_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Natural Language Processing with Deep Learning in Python


KEY FEATURES

In this course you'll explore advanced natural language processing - the field of computer science and AI that concerns interactions between computer and human languages. Over the course you'll learn four new NLP architectures and explore classic NLP problems like parts-of-speech tagging and named entity recognition, and use recurrent neural networks to solve them. By course's end, you'll have a firm grasp on natural language processing and its many applications.

  • Access 40 lectures & 4.5 hours of content 24/7
  • Discover Word2Vec & how it maps words to a vector space
  • Explore GLoVe's use of matrix factorization & how it contributes to recommendation systems
  • Learn about recursive neural networks which will help solve the problem of negation in sentiment analysis

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: advanced, but you must have some knowledge of calculus, linear algebra, probability, Python, Numpy, and be able to write a feedforward neural network in Theano and TensorFlow.
  • All code for this course is available for download here, in the directory nlp_class2

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

          Practical Deep Learning in Theano and TensorFlow for $29        
Build & Understand Neural Networks Using Two of the Most Popular Deep Learning Techniques
Expires November 02, 2021 23:59 PST
Buy now and get 75% off

KEY FEATURES

The applications of Deep Learning are many, and constantly growing, just like the neural networks that it supports. In this course, you'll delve into advanced concepts of Deep Learning, starting with the basics of TensorFlow and Theano, understanding how to build neural networks with these popular tools. Using these tools, you'll learn how to build and understand a neural network, knowing exactly how to visualize what is happening within a model as it learns.

  • Access 23 lectures & 3 hours of programming 24/7
  • Discover batch & stochastic gradient descent, two techniques that allow you to train on a small sample of data at each iteration, greatly speeding up training time
  • Discuss how momentum can carry you through local minima
  • Learn adaptive learning rate techniques like AdaGrad & RMSprop
  • Explore dropout regularization & other modern neural network techniques
  • Understand the variables & expressions of TensorFlow & Theano
  • Set up a GPU-instance on AWS & compare the speed of CPU vs GPU for training a deep neural network
  • Look at the MNIST dataset & compare against known benchmarks
Like what you're learning? Try out the The Advanced Guide to Deep Learning and Artificial Intelligence next.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, and Numpy
  • All code for this course is available for download here, in the directory ann_class2

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

          The Deep Learning and Artificial Intelligence Introductory Bundle for $39        
Companies Are Relying on Artificial Intelligence to Learn Faster Than Ever. Time to Catch Up.
Expires October 31, 2021 23:59 PST
Buy now and get 91% off

Deep Learning Prerequisites: Linear Regression in Python


KEY FEATURES

Deep Learning is a set of powerful algorithms that are the force behind self-driving cars, image searching, voice recognition, and many, many more applications we consider decidedly "futuristic." One of the central foundations of deep learning is linear regression; using probability theory to gain deeper insight into the "line of best fit." This is the first step to building machines that, in effect, act like neurons in a neural network as they learn while they're fed more information. In this course, you'll start with the basics of building a linear regression module in Python, and progress into practical machine learning issues that will provide the foundations for an exploration of Deep Learning.

  • Access 20 lectures & 2 hours of content 24/7
  • Use a 1-D linear regression to prove Moore's Law
  • Learn how to create a machine learning model that can learn from multiple inputs
  • Apply multi-dimensional linear regression to predict a patient's systolic blood pressure given their age & weight
  • Discuss generalization, overfitting, train-test splits, & other issues that may arise while performing data analysis
Like what you're learning? Try out the The Advanced Guide to Deep Learning and Artificial Intelligence next.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, and Numpy
  • All code for this course is available for download here, in the directory linear_regression_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Deep Learning Prerequisites: Logistic Regression in Python


KEY FEATURES

Logistic regression is one of the most fundamental techniques used in machine learning, data science, and statistics, as it may be used to create a classification or labeling algorithm that quite resembles a biological neuron. Logistic regression units, by extension, are the basic bricks in the neural network, the central architecture in deep learning. In this course, you'll come to terms with logistic regression using practical, real-world examples to fully appreciate the vast applications of Deep Learning.

  • Access 31 lectures & 3 hours of content 24/7
  • Code your own logistic regression module in Python
  • Complete a course project that predicts user actions on a website given user data
  • Use Deep Learning for facial expression recognition
  • Understand how to make data-driven decisions
Like what you're learning? Try out the The Advanced Guide to Deep Learning and Artificial Intelligence next.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, and Numpy
  • All code for this course is available for download here, in the directory logistic_regression_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Data Science: Deep Learning in Python


KEY FEATURES

Artificial neural networks are the architecture that make Apple's Siri recognize your voice, Tesla's self-driving cars know where to turn, Google Translate learn new languages, and so many more technological features you have quite possibly taken for granted. The data science that unites all of them is Deep Learning. In this course, you'll build your very first neural network, going beyond basic models to build networks that automatically learn features.

  • Access 37 lectures & 4 hours of content 24/7
  • Extend the binary classification model to multiple classes uing the softmax function
  • Code the important training method, backpropagation, in Numpy
  • Implement a neural network using Google's TensorFlow library
  • Predict user actions on a website given user data using a neural network
  • Use Deep Learning for facial expression recognition
  • Learn some of the newest development in neural networks
Like what you're learning? Try out the The Advanced Guide to Deep Learning and Artificial Intelligence next.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: intermediate, but you must have some knowledge of calculus, linear algebra, probability, Python, and Numpy
  • All code for this course is available for download here, in the directory ann_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Data Science: Practical Deep Learning in Theano & TensorFlow


KEY FEATURES

The applications of Deep Learning are many, and constantly growing, just like the neural networks that it supports. In this course, you'll delve into advanced concepts of Deep Learning, starting with the basics of TensorFlow and Theano, understanding how to build neural networks with these popular tools. Using these tools, you'll learn how to build and understand a neural network, knowing exactly how to visualize what is happening within a model as it learns.

  • Access 23 lectures & 3 hours of programming 24/7
  • Discover batch & stochastic gradient descent, two techniques that allow you to train on a small sample of data at each iteration, greatly speeding up training time
  • Discuss how momentum can carry you through local minima
  • Learn adaptive learning rate techniques like AdaGrad & RMSprop
  • Explore dropout regularization & other modern neural network techniques
  • Understand the variables & expressions of TensorFlow & Theano
  • Set up a GPU-instance on AWS & compare the speed of CPU vs GPU for training a deep neural network
  • Look at the MNIST dataset & compare against known benchmarks
Like what you're learning? Try out the The Advanced Guide to Deep Learning and Artificial Intelligence next.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but you must have some knowledge of calculus, linear algebra, probability, Python, and Numpy
  • All code for this course is available for download here, in the directory ann_class2

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

          Introducing Cisco HyperFlex for GPU-Accelerated VDI at NVIDIA GTC Amsterdam        
NVIDIA is holding the first European edition of its successful GPU Technology Conference next week in Amsterdam. Cisco is excited to participate as a Platinum sponsor. We will have subject-matter experts on hand to talk to you about deep-learning, machine learning, virtualized graphics workstations and the infrastructure needed to support these. We’re excited about introducing […]
          IBM speeds deep learning by using multiple servers        

For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers -- not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It's available only in IBM's PowerAI 4.0 software package, which runs exclusively on IBM's own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn't require developers to learn an entirely new deep learning framework. It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

To read this article in full or to leave a comment, please click here


          In Too Deep: Learning to Ski Powder        
Last year, I missed the best month of snow so I was determined to ski Sundance Ladies Day this February. Day 1, I was not disappointed: 6 inches of deep powder. WOW! I have never skied that much deep powder. But, I am getting ahead of myself. In my last post, I discussed being OK …
           Zu Besuch in Amazons Hauptquartier in Seattle: Der unsichtbare Riese         

Amazon dominiert einen Großteil des Onlinehandels, baut eigene Produkte und versucht sich an Finanztechnologien wie Amazon Pay. Aber das ist nur die Fassade. Im Hintergrund versteckt sich ein Konzern, der eine viel größere Vision verfolgt. t3n hat dem Giganten einen Besuch abgestattet.

Ich sehe ihn nicht. Etwas verwirrt stehe ich auf dem Gehweg, mitten in Seattle, im Stadtbezirk Westlake. Google Maps zeigt mir an, dass ich mich am richtigen Platz aufhalte. Aber ich kann nur ein paar Gebäude, die herbstlich-weihnachtliche Starbucks-Werbung für Pumpkin-Spice-Latte und lauter Bäume mit rot-gelben Blättern erkennen. Eigentlich sollte er genau hier sein, einer der größten Onlinehändler der Welt, genauer gesagt der Campus von Amazon, das Hauptquartier. Aber ich sehe es nicht.

Das ist schon schräg. Ich suche den vielleicht wichtigsten E-Commerce-Händler der Welt, einen Konzern, der in den ersten neun Monaten 2016 rund 92 Milliarden US-Dollar umgesetzt hat, der 2015 mehr als 300.000 Mitarbeiter beschäftigte, der mehr als 300 Millionen Kunden weltweit zählt. Und kann auf den Gebäuden um mich herum nicht einmal ein Logo entdecken.

Gratis-Bananen statt Glaspaläste

Erst ein Bananenstand entwirrt meine Verwirrung. Den „Community Banana Stand“ kenne ich aus eigener Recherche, eine Gratis-Bananen-Aktion von Amazon. Ich kann also nicht ganz falsch sein. Das nächste Indiz entdecke ich am Gebäude 440 Terry-Avenue-North. „Day One North“ heißt es dort auf einem Schriftzug, eine Anspielung auf das Jeff-Bezos-Zitat „It’s still day one“, wie der Amazon-Kenner weiß. Ich öffne die Tür und betrete die Lobby – und da, endlich, prangt auch das erste Logo.

Hier, in Seattle, lässt sich die Strategie von Firmengründer Jeff Bezos vielleicht am besten erkennen: Amazon ist überall, aber man sieht den Konzern nicht. Google, Apple, Facebook – sie alle bauen riesige Firmenzentralen im Silicon Valley, eine Anlaufstelle für alles, einen Pilgerort für die Glaubensgemeinde, eine größer und pompöser als die andere. Nicht so Bezos. Einen wirklichen Campus gibt es nicht. In insgesamt 30 Gebäuden verteilt sich sein Unternehmen über die gesamte Stadt, im Süden stehen zwei weitere Wolkenkratzer, ein dritter ist gerade im Bau. Nicht an einem einzigen davon lässt sich von außen erkennen, dass Amazon im Innern zu finden ist. „Es ist nicht unsere Art auffällig zu sein, daher brauchen wir unsere Logos nicht auf allen unseren Gebäuden.“, heißt es am Empfangstresen.

Subtil, unaufdringlich, ja zurückhaltend gibt sich das Unternehmen an seinem Heimatstandort. Das ist keine Koketterie, sondern ein Spiegelbild der Firmenphilosophie: Völlig unauffällig, aber zielstrebig durchsetzt Amazon nicht nur Seattle, sondern auch den Alltag von Millionen Menschen auf der ganzen Welt. Der Verkauf von Waren online stellt da noch den offensichtlichsten Teilaspekt dar. Von frischen Lebensmitteln bis hin zu Werkzeug findet der Kunde dort alles. Mit Amazon Echo und dem Kindle Reader versucht sich das Unternehmen außerdem an hauseigener Hardware, mit Prime drängt es in den Streamingmarkt und mit Amazon Pay in die Finanztechnologie.

Eher wie ein Coffeeshop-Tresen als wie die Lobby des größten E-Commerce-Händlers der Welt sieht der Eingangsbereich von Day 1 North aus. Immerhin gibt es dort Leckerlis für Hunde, welche die Mitarbeiter mitbringen dürfen. (Foto: Jochen Fuchs)
Im Hintergrund versorgt Amazon aber nicht nur den Verbraucher, sondern hat mit seinem Marktplatz eine riesige B2B-Plattform aufgezogen. Händler können ihre Produkte darüber als eigenständiger Anbieter verkaufen – Firmen wie etwa das Startup KW-Commerce basieren ganze Geschäftsmodelle darauf. Und bei der Produktsuche hat Amazon längst Google abgelöst. Wie stark das Portfolio des Konzerns gewachsen ist, veranschaulicht auch das Cloud-Geschäft. Mit Amazon Web Services (AWS) erwirtschaftete der Konzern in den ersten neun Monaten 2016 knapp neun Milliarden US-Dollar. Damit steht die Sparte für etwa zehn Prozent des Gesamtumsatzes. Vor allem aber dominiert AWS den Infrastrukturmarkt: Mit 45 Prozent besitzt Amazon mehr Marktanteile als Microsoft, Google und IBM gemeinsam.

Der ewig erste Tag

Aber Jeff Bezos reicht das nicht. Der Amazon-Gründer treibt seine Mitarbeiter ständig an, weiter neue Ideen zu entwickeln. Das Zitat „It’s still day one“ veranschaulicht das besonders gut. Die Beschäftigten sollen jeden Tag so denken, als sei es noch der erste Tag des Unternehmens, als ließe sich noch alles neu entdecken, als ließe sich noch alles umwerfen. Risikofreudig Gelegenheiten ergreifen, die eine Chance auf eine zukünftige Marktführerschaft ermöglichen. Der Kunde steht im Fokus. Denn das zahlt letztlich auf die Marke Amazon ein. „Mit allem was wir machen, wollen wir Kunden einen Mehrwert bieten.“, fasst es Patrick Gauthier, Vizepräsident von Amazon Pay, zusammen. Das Unternehmen verfolge keinen geheimen Masterplan, es identifiziere Kundenbedürfnisse.

Seattle und die Gegend drumherum halten als Spielwiese für dieses Ausprobieren her. In der größten Stadt des Staates Washington wurde der Lieferdienst Prime Now, der Produkte mittlerweile auch weltweit am gleichen Tag ausliefert, erstmals ausprobiert. Hier startete der Supermarkt Amazon Go, bei dem Kunden dank Walk-Out-Technologie einfach Waren einpacken und – ohne an eine Kasse zu treten – wieder gehen können. Bezahlt wird automatisch mit der Amazon-App. Durch Kameras, Sensoren und Deep-Learning-Algorithmen weiß sie, wann der Kunde das Geschäft verlässt und der Einkauf beendet ist.

Was neues mit Büchern

Und hier in Seattle steht auch der erste Amazon Bookstore, eine Annäherung des Konzerns an die analoge Welt. Auf den ersten Blick sieht das Geschäft in der University Village wie ein ganz normaler Buchladen aus, mit dunklen Ledersesseln und hochwertigen Echtholzmöbeln. Die Besonderheiten scheinen erst auf den zweiten Blick durch: Das Sortiment ist datengetrieben und anhand der Kundenvorlieben in Seattle zusammengestellt. Die Preise sind nicht an den Büchern notiert, sondern können per Amazon-App abgefragt werden. Die Regale zeigen nicht nur Genres, sondern auch „Bestseller-Sachbücher im Nordwesten“, „Die populärsten ersten Comics für Anfänger“, „Höchstbewertet mit 4,8 Sternen und mehr“ und „Wenn Ihnen ‚Zero to One‘ gefällt, gefällt Ihnen auch das hier“. Und wer bezahlen will, kann das mit seiner Amazon-App an der Kasse machen. Alles wie im Onlineshop – nur eben in echt.

Neben den Büchern stellt Amazon die eigene Hardware in den Fokus. In der Kinderabteilung stehen Fire-Tablets der Kids-Edition, auf dem Kindle-Reader können Kunden Bücher durchblättern. Eigene Regale und Fernseher präsentieren hier alles von Amazon Echo bis zum Fire-TV-Stick. Das Konzept scheint sich bewährt zu haben. Amazon hat binnen eines Jahres auch einen Bookstore in Washington Square im Bundesstaat Oregon und in Westfield in Kalifornien aufgebaut. Geschäfte in New York, New Jersey, Illinois und Massachussetts sollen folgen.

Dass ausgerechnet der Online-Pionier nun offline Ware verkauft, mag überraschen. Für Amazon ist es aber nur ein logischer Schritt, weiter zu auf den Verbraucher. „Es geht nicht um online versus offline, es geht um die Kundenerfahrung.“, sagt Amazon-Pay-Vize Gauthier. „Aus Amazons Sicht, ist der Kunde wohl vorurteilsfrei in der Nutzung von Kanälen.“ Wichtiger sei vielmehr, dass alle Kanäle intelligent miteinander verbunden seien, wie im Amazon Bookstore.

In den drei Glaskuppeln, die derzeit am Amazon-Headquarter entstehen, sollen Mikroklimazonen für über 300 Pflanzen aus aller Welt entstehen. Aber auch Mitarbeiter sollen hier einen Arbeitsplatz im Grünen finden. (Foto: Jochen Fuchs)
Das gelingt nicht immer, trotz Finanz- und Datenkraft: Der Treasure Truck etwa, ein Laster mit umlaufender Amazon-Leuchtschrift. Mit dieser Willy-Wonka-Version des Eismann-Wagens und einer dazugehörigen App veranstaltet der Konzern Flash-Sales: Nur ein einziges Produkt pro Tag wird angeboten, ein Paddelboot zum Beispiel oder ein hochwertiges Porterhouse-Steak, zu einem niedrigen Preis. So richtig schlägt der auf Retro getrimmte Truck offenbar nicht ein: Bisher dreht er nur in Seattle seine Runden. Aber das ist ein kleines und verschmerzbares Projekt für den Onlinegiganten.

Ein Mitarbeiter namens „Robo-Stow“

Wichtiger für den Konzern ist ein Unternehmensteil im etwa eine Autostunde entfernten DuPont. Mitten im Grünen, in der Nähe des Mount-Rainier-Nationalparks, hat Amazon ein 100.000 Quadratmeter großes High-Tech-Logistikzentrum errichtet. Dort versucht sich der Onlinehändler an der Zukunft der Logistik. Der stärkste „Mitarbeiter“ der Lagerhalle nennt sich „Robo-Stow“: Ein sechs Tonnen schwerer, gelber Roboterarm, der komplette Paletten sieben Meter in die Höhe stemmt und auf einem selbstfahrenden Gefährt abstellt. Das wiederum transportiert die Palette eigenständig durch eine Landschaft von Förderländern in den Lagerbereich. Damit der Mensch nicht im Weg steht, zeigen ihm fest eingezeichnete Markierungen an, auf welchen Pfaden er sich bewegen darf – denn wenn er sie verlässt, besteht Unfallgefahr.

Dass Amazon auf automatisierte Helfer setzt, hängt auch mit Effizienz zusammen. In ein robotergetriebenes Logistikzentrum können deutlich mehr Waren eingelagert werden als in ein herkömmliches. Trotzdem kommt es nicht ganz ohne den Menschen aus: Von ursprünglich 350 Mitarbeitern im Jahr 2014 ist die Zahl auf derzeit 750 Mitarbeiter angestiegen. Denn auch wenn flache, orangefarbene Roboter unter den Regalen hin- und herflitzen und Artikel in die Regale einsortieren, bleiben die Mitarbeiter unabkömmlich: Packen, einlagern und Lastwagen beladen beherrschen sie immer noch am besten.

Derzeit kann von Kürzungen bei Amazon aber ohnehin keine Rede sein – nicht nur in den Lagern. 100.000 neue Beschäftigte will der Konzern in den kommenden 18 Monaten in den USA einstellen, das ist ein Drittel der heutigen Zahl. Wenn für jede Stelle nur drei Bewerbungsgespräche geführt würden, würden das 800 Bewerbungsgespräche pro Tag bedeuten, hat das Handelsblatt ausgerechnet. Wie viele von den Neulingen nach Seattle kommen werden, wird man sehen. Aktuell beschäftigt Amazon 27.000 Mitarbeiter an seinem Heimatstandort, monatlich kommen schätzungsweise 1.000 neue hinzu. Wenn der dritte Tower im Süden fertiggestellt ist, hat Amazon Platz für rund 55.000 Mitarbeiter.

Vor einiger Zeit gab es Gerüchte, dass das Hauptquartier aus der Innenstadt in die günstigeren Vororte umziehen könnte. Spätestens mit dem Bau des dritten Towers im Süden sind diese Gerüchte verstummt. Jeff Bezos hat sich ganz bewusst für die Innenstadt entschieden. Die Abwechslung, die die umgebenden Geschäfte, Restaurants und Food-Trucks bieten, soll einen positiven Effekt auf die Amazonians haben. Und die umgekehrt auf die lokale Wirtschaft. Denn der Konzern verzichtet auf eigene Kantinen oder Aufenthaltsräume. Es gibt nur einige wenige Cafeterien auf dem Campus. Die Mitarbeiter sollen in der Umgebung essen gehen.

Auch so eine Eigenheit des Onlinehändlers: Während andere Konzerne wie Google und Facebook ihre Mitarbeiter mit freien Snacks verwöhnen, hält sich Amazon damit zurück. Außer Wasser und Kaffee bekommen die Mitarbeiter nichts gratis, in den Cafeterien zahlen sie alles selbst. „Wer bei Amazon arbeitet, tut das weil er es gerne macht“, erzählt ein Amazon-Mitarbeiter. „Nicht wegen der Gratis-Goodies.“ Wer eine Cola will, der kauft sich eine. Ende.

Das heißt nicht, dass Amazon keine Unternehmenskultur besitzt. Aber eben eine andere. Haustiere zum Beispiel genießen allzeitiges Aufenthaltsrecht in den Büros. Mittlerweile beherbergen die Arbeitsräume 2.000 Kleintiere, die meisten davon Hunde. Leckerlis für sie stehen auf jedem Empfangstresen in der Lobby. Die Begeisterung geht so weit, dass das neue Amazon-Prestigegebäude, die Biosphäre „The Spheres“, aktuell unter dem Decknamen „Rufus II“ errichtet wird – eine Hommage an den ersten Bürohund.

Broomball, das Karriere-Spiel

Als ich durch eine der Cafeterien laufe, entdecke ich eine Bildergalerie mit stolzen Menschen, die seltsame Besen präsentieren und sportliche Kleidung tragen. Ich bleibe stehen. „Quidditch?“, frage ich. Mein Guide lacht. „Nein, Broomball.“ Die Sportart wurde in Amazons Anfängen erfunden – noch bevor die ersten „Harry Potter“-Bücher herauskamen. Die Innovationslust des Onlinehändlers macht eben auch beim Sport nicht halt.

Mit zusammengeklebten Besen treiben Mannschaften einen großen aufblasbaren Ball gemeinsam mit Jeff Bezos über ein Feld, traditionell am „Picnic Day“, dem Firmenausflugstag. Was spaßig klingt, wird mit größtem Ehrgeiz betrieben: Obskure Internet-Legenden munkeln, dass Bezos das Happening so ernst nehme, dass Erfolg und Misserfolg in dem Spiel die Karriere beeinflussen.

Das erste Amazon-Büro – noch heute gibt sich das Unternehmen an seinem Firmensitz bewusst unauffällig und ohne Koketterie. (Foto: Amazon)
Dass ich diese Einblicke erhalte, täuscht nicht darüber hinweg, dass mir nur ein Spalt in die Welt von Amazon gewährt wird. Ein Rest Zurückhaltung bleibt. Wenn ich frage, was genau in diesem oder jenem Gebäude steckt, welcher Unternehmensbereich, welche Projekte, dann weiß mein Amazon-Guide oft keine Antwort. Vielleicht macht es die stetige Veränderung schwierig, den Überblick zu behalten: Unternehmensbereiche werden gegründet oder aufgelöst, neue Gebäude innerhalb weniger Monate aus dem Boden gestampft. Vielleicht lässt es die ständige Veränderung nicht zu, Amazon in- und auswendig zu kennen. Vielleicht will man es auch einfach nicht offenbaren.

Vielleicht hätte Jeff Bezos meine Fragen beantworten können, leider habe ich ihn nicht in Seattle getroffen. Als ich meinen Guide im Scherz nach dem Gründer frage, lacht er, erzählt mir aber, dass Bezos immer noch stark ins Tagesgeschäft eingebunden sei und regelmäßig in Seattle in der Zentrale arbeite. Theoretisch könne man den Chef überall antreffen.

So wie Amazon selbst.


          Bringing neural networks to cellphones        

Image: Jose-Luis Olivares/MIT

In recent years, the best-performing artificial-intelligence systems — in areas such as autonomous driving, speech recognition, computer vision, and automatic translation — have come courtesy of software systems known as neural networks.

But neural networks take up a lot of memory and consume a lot of power, so they usually run on servers in the cloud, which receive data from desktop or mobile devices and then send back their analyses.

Last year, EECS Associate Professor Vivienne Sze and colleagues unveiled a new, energy-efficient computer chip optimized for neural networks, which could enable powerful artificial-intelligence systems to run locally on mobile devices.

Now, Sze and her colleagues have approached the same problem from the opposite direction, with a battery of techniques for designing more energy-efficient neural networks. First, they developed an analytic method that can determine how much power a neural network will consume when run on a particular type of hardware. Then they used the method to evaluate new techniques for paring down neural networks so that they’ll run more efficiently on handheld devices.

The researchers describe the work in a paper they’re presenting next week at the Computer Vision and Pattern Recognition Conference. In the paper, they report that the methods offered as much as a 73 percent reduction in power consumption over the standard implementation of neural networks, and as much as a 43 percent reduction over the best previous method for paring the networks down.

Energy evaluator

Loosely based on the anatomy of the brain, neural networks consist of thousands or even millions of simple but densely interconnected information-processing nodes, usually organized into layers. Different types of networks vary according to their number of layers, the number of connections between the nodes, and the number of nodes in each layer.

The connections between nodes have “weights” associated with them, which determine how much a given node’s output will contribute to the next node’s computation. During training, in which the network is presented with examples of the computation it’s learning to perform, those weights are continually readjusted, until the output of the network’s last layer consistently corresponds with the result of the computation.

“The first thing we did was develop an energy-modeling tool that accounts for data movement, transactions, and data flow,” Sze says. “If you give it a network architecture and the value of its weights, it will tell you how much energy this neural network will take. One of the questions that people had is ‘Is it more energy efficient to have a shallow network and more weights or a deeper network with fewer weights?’ This tool gives us better intuition as to where the energy is going, so that an algorithm designer could have a better understanding and use this as feedback. The second thing we did is that, now that we know where the energy is actually going, we started to use this model to drive our design of energy-efficient neural networks.”

In the past, Sze explains, researchers attempting to reduce neural networks’ power consumption used a technique called “pruning.” Low-weight connections between nodes contribute very little to a neural network’s final output, so many of them can be safely eliminated, or pruned.

Principled pruning

With the aid of their energy model, Sze and her colleagues — first author Tien-Ju Yang and Yu-Hsin Chen, both EECS graduate students — varied this approach. Although cutting even a large number of low-weight connections can have little effect on a neural net’s output, cutting all of them probably would, so pruning techniques must have some mechanism for deciding when to stop.

The MIT researchers thus begin pruning those layers of the network that consume the most energy. That way, the cuts translate to the greatest possible energy savings. They call this method “energy-aware pruning.”

Weights in a neural network can be either positive or negative, so the researchers’ method also looks for cases in which connections with weights of opposite sign tend to cancel each other out. The inputs to a given node are the outputs of nodes in the layer below, multiplied by the weights of their connections. So the researchers’ method looks not only at the weights but also at the way the associated nodes handle training data. Only if groups of connections with positive and negative weights consistently offset each other can they be safely cut. This leads to more efficient networks with fewer connections than earlier pruning methods did.

"Recently, much activity in the deep-learning community has been directed toward development of efficient neural-network architectures for computationally constrained platforms,” says Hartwig Adam, the team lead for mobile vision at Google. “However, most of this research is focused on either reducing model size or computation, while for smartphones and many other devices energy consumption is of utmost importance because of battery usage and heat restrictions. This work is taking an innovative approach to CNN [convolutional neural net] architecture optimization that is directly guided by minimization of power consumption using a sophisticated new energy estimation tool, and it demonstrates large performance gains over computation-focused methods. I hope other researchers in the field will follow suit and adopt this general methodology to neural-network-model architecture design."

 

Date Posted: 

Thursday, July 20, 2017 - 2:00pm

Card Title Color: 

Black

Card Description: 

EECS Associate Professor Vivienne Sze and colleagues are developing methods for modeling neural networks’ power consumption, which could help make the systems portable.

Photo: 

Card Wide Image: 

Card Title: 

Bringing neural networks to cellphones

          Nuit Blanche in Review (July 2017)        
Since the last Nuit Blanche in Review (June 2017), it was found that Titan had interesting chemistry. On Nuit Blanche, on the other hand, we had four implementations released by their authors, several interesting in-depth articles (some of them related to SGD and Hardware) . We had several slides and videos of meetings and schools and three job offering. Enjoy !


In-depth

SGD related

CS/ML Hardware


Slides

Videos

Job:

Other 


Credit: Northern Summer on Titan, NASA/JPL-Caltech/Space Science Institute


Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Slides: Deep Learning and Reinforcement Learning Summer School 2017 @ MILA Montreal, Canada        
The Deep Learning and Reinforcement Learning Summer School 2017 just finished and here are some of the slides presented there (videos should be coming later) 



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

          Jabil und eyeSight Technologies schließen Entwicklungspartnerschaft für In-Car-Sensortechnologie der neuen Generation / Höhere Fahrzeugsicherheit durch verbesserte Gestensteuerung und Fahrerüberwachung        
Jabil Inc. (NYSE:JBL) und eyeSight Technologies verkünden ihre Partnerschaft zur Entwicklung einer neuen Generation von In-Car Sensortechnologie, die fortschrittlichste Fahrerüberwachung und Gestensteuerung ermöglicht. Die Kooperation verbindet die Expertise von Jabil im Bereich optischer Systeme für die Automobilindustrie mit der Computervisions- und Deep-Learning-Software von eyeSight. Gemeinsam arbeiten Jabil und eyeSight an einem System, das den Aktivitäts- […]
          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        
 There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tail of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that rely on accurate transcription for their operations. Read More

          An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter        
Technology Review

Researchers at the Massachusetts Institute of Technology (MIT) have developed a deep-learning textual sentiment algorithm that was trained on emoji, and which can analyze tweets to pick up sarcasm and general emotional subtext. "The neural network learned the connection between a certain kind of language and an emoji," says MIT professor Iyad Rahwan. The DeepMoji algorithm was trained on 1.2 million tweets containing some combination of 64 popular emoji. The team trained the system to anticipate which emoji would be used with a particular message, based on whether the emoji reflected a particular emotion or sentiment. The researchers then taught DeepMoji to recognize sarcasm using an existing set of labeled examples. Testing showed DeepMoji outperformed both other top sentiment-detecting algorithms and humans in the identification of sarcasm and other emotions on Twitter. Experts say the research demonstrates that computers are gradually becoming more adept at sensing human emotion.

From "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter"
Technology Review (08/03/17) Will Knight
View Full Article
          IBM speeds deep learning by using multiple servers        

For everyone frustrated by how long it takes to train deep learning models, IBM has some good news: It has unveiled a way to automatically split deep-learning training jobs across multiple physical servers -- not just individual GPUs, but whole systems with their own separate sets of GPUs.

Now the bad news: It's available only in IBM's PowerAI 4.0 software package, which runs exclusively on IBM's own OpenPower hardware systems.

Distributed Deep Learning (DDL) doesn't require developers to learn an entirely new deep learning framework. It repackages several common frameworks for machine learning: TensorFlow, Torch, Caffe, Chainer, and Theano. Deep learning projecs that use those frameworks can then run in parallel across multiple hardware nodes.

To read this article in full or to leave a comment, please click here


          Episode 122: #122: You’d Better Recognize        

This week Dave and Gunnar talk about recognition: facial recognition, keystroke recognition, Dothraki recognition.

men-at-computers-18

Cutting Room Floor


          REGISTER FOR ONE OF THE MOST COMPELLING BIG DATA EVENTS IN SYDNEY BEFORE IT SELLS OUT!        

Join us in Sydney on September 20-21 for DataWorks Summit/Hadoop Summit, the industry’s premier big data community event. Attend this year and learn from your peers and industry experts how open source technologies enable you to leverage all your data, on premise and in the cloud, to drive predictive analytics, distributed deep-learning, and artificial intelligence […]

The post REGISTER FOR ONE OF THE MOST COMPELLING BIG DATA EVENTS IN SYDNEY BEFORE IT SELLS OUT! appeared first on Hortonworks.


          New A.I. Course Focuses on Deep Learning        

There’s no denying that artificial intelligence and machine learning are drawing a lot of buzz. Company CEOs brag about integrating […]

The post New A.I. Course Focuses on Deep Learning appeared first on Dice Insights.


          How to Use Metrics for Deep Learning with Keras in Python        

The Keras library provides a way to calculate and report on a suite of standard metrics when training deep learning models. In addition to offering standard metrics for classification and regression problems, Keras also allows you to define and report on your own custom metrics when training deep learning models. This is particularly useful if […]

The post How to Use Metrics for Deep Learning with Keras in Python appeared first on Machine Learning Mastery.


          10 Command Line Recipes for Deep Learning on Amazon Web Services        

Running large deep learning processes on Amazon Web Services EC2 is a cheap and effective way to learn and develop models. For just a few dollars you can get access to tens of gigabytes of RAM, tens of CPU cores, and multiple GPUs. I highly recommend it. If you are new to EC2 or the […]

The post 10 Command Line Recipes for Deep Learning on Amazon Web Services appeared first on Machine Learning Mastery.


          How to Get Good Results Fast with Deep Learning for Time Series Forecasting        

3 Strategies to Design Experiments and Manage Complexity on Your Predictive Modeling Problem. It is difficult to get started on a new time series forecasting project. Given years of data, it can take days or weeks to fit a deep learning model. How do you get started exactly? For some practitioners, this can lead to […]

The post How to Get Good Results Fast with Deep Learning for Time Series Forecasting appeared first on Machine Learning Mastery.


          Microsoft Cognitive Toolkit 2.0        

L'année dernière, Microsoft a publié la première mouture de Cognitive Toolkit, (CNTK) une boîte à outils dédiée au deep learning et à l'intelligence artificielle.

Cognitive Toolkit décrit les réseaux neuronaux comme une série d'étapes de calculs le long d'un arbre orienté. Les noeuds feuilles de l'arbre sont des valeurs d'entrée ou des paramètres réseau, les autres noeuds étant des opérations matricielles sur les entrées.

Microsoft vient de publier la version 2.0 de Cognitive Toolkit. Cette mouture apporte de nouvelles fonctionnalités et des corrections de bugs, mais elle n'est en revanche pas entièrement compatible avec Cognitive Toolkit 1.0

Cognitive Toolkit expose des API pour Java, C#, Python et supporte CUDA.

Microsoft Cognitive Toolkit 2.0 est sur GitHub.

Catégorie actualité: 
Image actualité AMP: 

          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        

 There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tail of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that […]

The post Tetra raises a $1.5M seed round to bring deep learning to voice transcription appeared first on RocketNews | Top News Stories From Around the Globe.


          Tony Frazier: DigitalGlobe Applies Crowdsourcing, Deep-Learning Tools in Satellite Image Analysis        
Tony Frazier, senior vice president of government solutions at DigitalGlobe, has said the company works to analyze large amounts of satellite images and data through deep-learning tools and crowdsourcing, GCN reported Wednesday. Frazier told GCN columnist Patrick Marshall in an interview that OpenCV computer vision software library, Caffe deep-learning framework and NVIDIA graphics processors are some of the deep-learning systems […]
          Link roundup #8        
Quite a backlog of good links this time!


          ä»Žç»Ÿè®¡å­¦è§’度来看深度学习(3):记忆和核方法        

原文链接:http://blog.shakirm.com/2015/04/a-statistical-view-of-deep-learning-iii-memory-and-kernels/

作者:Shakir Mohamed

methodTriangle1-300x300

连接机器学习的回归方法

人们通过对以往的经验或者数据的回忆来推断未来的事物,这样的过程可以用一个经常出现在最近文献中的词语——记忆来概括。机器学习模型都是由这样的‘记忆’组成的,如何理解这些‘记忆’对于如何使用模型是极为重要的。根据机器学习模型的种类,可以分为两种主要的记忆机制,即参数型与非参数型(还包括了介于两者之间的模型)。深度网络作为参数记忆型模型的代表,它将统计特性从所观察到的数据中以模型参数或者权重的方式提炼出来。而非参数模型中的典范则是核机器(以及最近邻),它们的记忆机制是存储所有数据。我们可以自然地认为,深度网络与核机器是两种原理不同的由数据推导结论的方法,但是实际上,我们研究出这些方法的过程却表明它们之间有着更加深远的联系以及更基本的相似性。

深度网络、核机器以及高斯过程三者形成了解决相同问题的一套连贯的方法。它们的最终形式很不相同,但是它们本质上却是相互联系的。了解这一点对于更深入的研究十分有用,而这种联系正是这篇文章将要探讨的。


          ä»Žç»Ÿè®¡å­¦è§’度来看深度学习(2):自动编码器和自由能        

原文链接:http://blog.shakirm.com/2015/03/a-statistical-view-of-deep-learning-ii-auto-encoders-and-free-energy/

本文得到了原英文作者Shakir Mohamed的授权同意,由钟琰翻译、何通审校。感谢他们的支持和帮助。

基于前馈深度神经网络的判别模型已经在许多工业应用中获得了成功,引发了探寻如何利用无监督学习方法带来相似结果的热潮。降噪自动编码器是深度学习中一种主要的无监督学习方法。本文将探索降噪自编码器和统计学中密度估计之间的联系,我们将从统计学的视角去考察降噪自动编码器学习方法,并将之视为一种潜在因子模型的推断问题。我们的机器学习应用能从这样的联系中获得启发并受益。

广义的降噪自动编码器(GDAEs)

降噪自动编码器是无监督深度学习中的一个重大进步,它极大的提升了数据表示的可扩展性和稳健性。对每个数据点y,降噪自动编码器先利用一个已知的噪化过程$\mathcal{C}(\mathbf{y}’|\mathbf{y})$建立一个$\mathbf{y}$的含噪声版本$\mathbf{y}’$,其后我们以$\mathbf{y}’$为输入利用神经网络来重新恢复原始数据$\mathbf{y}$。整个学习网络可以被分为两个部分:编码器和解码器,其中编码器$\mathbf{z}$的输出可被认为是原始数据的一种表示或特征。该问题的目标函数如下1:

$$\textrm{Perturbation:}\quad \mathbf{y}’ \sim\mathcal{C}(\mathbf{y}’|\mathbf{y})$$

$$\textrm{Encoder:}\quad \mathbf{z(y’)} = f_\phi (\mathbf{y’})\qquad\textrm{Decoder:}\quad \mathbf{y} \approx g_\theta (\mathbf{z})$$

$$\textrm{Objective:}\quad\mathcal{L}_{DAE} = \log p(\mathbf{y} |\mathbf{z})$$


          ä»Žç»Ÿè®¡å­¦è§’度来看深度学习(1):递归广义线性模型        

原文链接:http://blog.shakirm.com/2015/01/a-statistical-view-of-deep-learning-i-recursive-glms/

作者:Shakir Mohamed

本文得到了原英文作者Shakir Mohamed的授权同意,由王小宁翻译、冯凌秉和朱雪宁审校。感谢他们的支持和帮助。

深度学习及其应用已经成为实用机器学习的一个关键工具。神经网络和许多现有的统计学、机器学习方法有同等重要的地位,我将在这篇文章中探索其中的一个观点。

看待深度神经网络,我们这里选择一个特别的角度:就是它可以被看做是一个递归的广义线性模型。广义线性模型作为概率建模的基石之一,在实验科学的应用中无处不在,并且极其实用。这篇文章集中讨论前馈神经网络(Feed Forward Neural Network),而关于回馈式神经网络(Recurrent Network)与前者的统计联系,我将在以后文章中讨论。

广义线性模型(GLMs)

基本的线性回归模型是一个从由自变量X组成的P维空间到一组因变量Y组成的空间的线性映射。具体地,该线性映射是指通过一组权重(或回归系数) 对X进行加权,并与截距项 的和。线性回归的输出可以是多元的,但在本文中假定其输出为标量。完整的概率模型假定上述线性模型受到高斯噪音的干扰(一般假设其方差未知)。

$$\eta=\beta^Tx+\beta_0$$

$$y = \eta+\epsilon \qquad \epsilon \sim \mathcal{N}(0,\sigma^2)$$

 在此公式中, $\eta$是该模型的系统成分, $\eta$是随机扰动项。广义线性模型(GLMs) [2]使我们能够对这一模型进行扩展,允许因变量的分布不局限于高斯分布而扩展到更广泛的分布(例如典型的指数分布族)。在这种情况下,我们可以写出广义回归问题,结合系数和偏置为更紧凑的表示法,如:

$$\eta = \beta^\top x, \qquad \beta=[\hat \beta, \beta_0], x = [\hat{x}, 1]$$

$$\mathbb{E}[y] = \mu = g^{-1}(\eta)$$

其中g(•)是连接函数,使我们能够从自然参数$\eta$求出均值参数$\mu$ 。如果把这个连接函数定义成是逻辑斯蒂函数,那么均值参数对应着服从伯努利分布的y等于1或0的概率。

有很多其他的连接函数让我们能够为目标(响应)变量y的分布做出不同假设。在深度学习中,连结函数一般指激活函数,我在下表中列出了它们在两个领域中的名称。从这个表中我们可以看出,很多流行的方法在神经网络与统计学中是一样的,但是在相关文献中(有时)有着完全不一样的名字,如统计中的多项分类回归(multimonial)和深度学习中的softmax分类,或是深度学习中的整流器以及统计中的截取回归模型,它们其实是一样的。

目标

类型 回归 连结 连结的逆 激活
实数 线性 恒等式 恒等式
二元 逻辑斯蒂 逻辑斯蒂$ \log\frac{\mu}{1 – \mu}$ S型σ$\frac{1}{1 + \exp(-\eta)}$ S型
二元 概率 逆的高斯累计分布函数$\Phi^{-1}(\mu)$ 高斯分布函数$ \Phi(\eta)$ 概率
二元 耶贝尔分布 Compl. log-log$ log(-log(\mu))$ 耶贝尔累计分布函数$e^{-e^{-x}}$
二元 逻辑斯蒂 双曲正切$\tanh(\eta)$ Tanh
分类的 多项式 多项式逻辑斯蒂$\frac{\eta_i}{\sum_j \eta_j}$ SOFTMAX
计数 泊松 $\log{\mu}$ $\exp(\nu)$
计数 泊松 $\sqrt(\mu)$ $\nu^2$
非负的 伽玛 倒数$\frac{1}{\mu}$ $\frac{1}{\nu}$
稀疏的 截取回归 最大值$\max(0;\nu)$ 纠正线性单位
顺序 序数 累积的逻辑斯蒂回归


          Tetra raises a $1.5M seed round to bring deep learning to voice transcription        

There are a million and one services for voice transcription on the market. But even with just one job to do, I’ve never seen a service that can handle the long tale of vocabulary used in the real world. This is particularly challenging if you’re a startup trying to sell your service to enterprises that […]

The post Tetra raises a $1.5M seed round to bring deep learning to voice transcription appeared first on DailyScene.com | .


          How AI can help make safer baby food (and other products)        

Editor’s note: Whether you’re growing cucumbers or building your own robot arm, machine learning can help. In this guest editorial, Takeshi Ogino of Kewpie tells us how they used machine learning to ensure the quality and safety of the ingredients that go into their food products.

Quality control is a challenge for most industries, but in the world of food production, it’s one of the biggest. With food, products are as good as the ingredients that go into them. Raw materials can vary dramatically, from produce box to produce box, or even from apple to apple. This means inspecting and sorting the good ingredients from the bad is one of the most important tasks any food company does. But all that work inspecting by hand can be time-consuming and arduous both in terms of overhead and manpower. So what’s a food company to do?

At Kewpie Corporation, we turned to a surprising place to explore better ways to ensure food quality: artificial intelligence built on TensorFlow.

Although Kewpie Corporation is most famous for our namesake mayonnaise, we’ve been around for 100 years with dozens of products, from dressings to condiments to baby foods. We’ve always believed that good products begin with good ingredients.

kewpie-1

Ingredients that are safe and also give you peace of mind

Last October, we began investigating whether AI and machine learning could ensure the safety and purity of our ingredients faster and more reliably than ever.

The project began with a simple question: “What does it mean to be a ‘good’ ingredient?” The ingredients we purchase must be safe, of course, and from trustworthy producers. But we didn’t think that went far enough. Ingredients also need to offer peace of mind. For example, the color of potatoes can vary in ways that have nothing to do with safety or freshness.

Kewpie depends on manual visual detection and inspection of our raw ingredients. We inspect the entire volume of ingredients used each day, which, at four to five tons, is a considerable workload. The inspection process requires a certain level of mastery, so scaling this process is not easy. At times we’ve been bottlenecked by inspections, and we’ve struggled to boost production when needed.

We’d investigated the potential for mechanizing the process a number of times in the past. However, the standard technology available to us, machine vision, was not practical in terms of precision or cost. Using machine vision meant setting sorting definitions for every ingredient. At the Tosu Plant alone we handle more than 400 types of ingredients, and across the company we handle thousands.

That’s when I began to wonder whether using machine learning might solve our problem.

Using unsupervised machine learning to detect defective ingredients

We researched AI and machine learning technology across dozens of companies, including some dedicated research organizations. In the end, we decided to go with TensorFlow. We were impressed with its capabilities as well as the strength of its ecosystem, which is global and open. Algorithms that are announced in papers get implemented quickly, and there’s a low threshold for trying out new approaches.

One great thing about TensorFlow is that it has such a broad developer community. Through Google, we connected with our development partner, BrainPad Inc, who impressed us with their ability to deliver production level solutions with the latest deep learning. But even BrainPad, who had developed a number of systems to detect defective products in manufacturing processes, had never encountered a company with stricter inspection standards than ours. Furthermore, because our inspections are carried out on conveyor belts, they had to be extremely accurate at high speeds. Achieving that balance between precision and speed was a challenge BrainPad looked forward to tackling.

kewpie-2
Sorting diced potato pieces at the Tosu Plant.

To kick off the project, we started with one of our most difficult inspection targets: diced potatoes. Because they’re an ingredient in baby food, diced potatoes are subject to the strictest scrutiny both in terms of safety and peace of mind. That meant feeding more than 18,000 line photographs into TensorFlow so that the AI could thoroughly learn the threshold between acceptable and defective ingredients.

Our big breakthrough came when we decided to use the AI not as a ”sorter” but an ”anomaly detector.” Designing the AI as a sorter meant supervised learning, a machine learning model that requires labels for each instance in order to accurately train the model. In this case that meant feeding into TensorFlow an enormous volume of data on both acceptable and defective ingredients. But it was hugely challenging for us to collect enough defective sample data. But by training the system to be an anomaly detector we could employ unsupervised learning. That meant we only needed to feed it data on good ingredients. The system was then able to learn how to identify acceptable ingredients, and reject as defective any ingredients that failed to match. With this approach, we achieved both the precision and speed we wanted, with fewer defective samples overall.

By early April, we were able to test a prototype at the Tosu Plant. There, we ran ingredients through the conveyor belt and had the AI identify which ones were defective. We had great results. The AI picked out defective ingredients with near-perfect accuracy, which was hugely exciting to our staff.

kewpie-3
The inspection team at the Tosu Plant.

It’s important to note that our goal has always been to use AI to help our plant staff, not replace them. The AI-enabled inspection system performs a rough removal of defective ingredients, then our trained staff inspects that work to ensure nothing slips through. That way we get “good” ingredients faster than ever and are able to process more food and boost production.

Today we may only be working with diced potatoes, but we can’t wait to expand to more ingredients like eggs, grains and so many others. If all goes well, we hope to offer our inspection system to other manufacturers who might benefit. Existing inspection systems such as machine vision have not been universally adopted in our industry because they're expensive and require considerable space. So there’s no question that the need for AI-enabled inspection systems is critical. We hope, through machine learning, we’re bringing even more safe and reassuring products to more people around the world.


          The Imperative to Democratize Artificial Intelligence         
November 30, 2016 (LocalOrg) - MIT Technology Review recently published an article titled, "An AI Ophthalmologist Shows How Machine Learning May Transform Medicine." In it, it describes how Google researchers at their DeepMind subsidiary used artificial intelligence (AI) to scan images of human eyes to detect a common form of blindness as well as, or better than trained experts can.


They achieved this by using the same machine learning techniques Google and other tech giants including Facebook use to analyze images that show up on their web platforms. Instead of creating complex programs to handle every conceivable detail in an image, researchers instead teach machines how to learn on their own when exposed to large volumes of pre-tagged examples.

In the MIT Technology Review article, DeepMind's algorithm studied some 128,000 retinal images that were already classified by ophthalmologists.

The breakthrough is only the latest in a long line of advances in AI. AI machine learning is already being widely used in real-world applications, including sifting through the United Kingdom's National Health Service's records, automatically tagging - and flagging - images, videos, and voice across vast social networks, improving efficiency at utility plants by spotting trends and automatically adjusting power consumption, inputs, and outputs, as well as developing protocols for both pharmaceutical production and genetic engineering.



DeepMind's research into analyzing medical imagery is already set to be integrated into its UK NHS collaboration, according to the Guardian in an article titled, "Google DeepMind pairs with NHS to use machine learning to fight blindness," which reports:
Google DeepMind has announced its second collaboration with the NHS, working with Moorfields Eye Hospital in east London to build a machine learning system which will eventually be able to recognise sight-threatening conditions from just a digital scan of the eye. 

The collaboration is the second between the NHS and DeepMind, which is the artificial intelligence research arm of Google, but Deepmind’s co-founder, Mustafa Suleyman, says this is the first time the company is embarking purely on medical research. An earlier, ongoing, collaboration, with the Royal Free hospital in north London, is focused on direct patient care, using a smartphone app called Streams to monitor kidney function of patients.
In essence, those who control AI technology have access to algorithms that can perform specific tasks better than any trained human can. This confers on those who control this technology an immense advantage and creates disparity those without AI technology have no means of competing against.


Corporations and nations wielding this power, as the number of applications expands, represent an alarming, emerging disparity that may lead to the same sort of abuses and exploitation other forms of technological disparity throughout history have wrought.

Democratizing AI 

Effort into developing AI applications involves big-data. Training machines rather than merely programming them, means exposing them to large amounts of information they can sift through and train themselves with. In order to do this, not only do large amounts of information need to be collected, they need to be tagged or otherwise classified so machines have a baseline to improve against.

Image: A deep learning developer box, via CADnetwork.
The development of these large data sets, as well as developing algorithms to exploit them, requires (at the moment) large numbers of participants outside of corporations like Google and their subsidiaries like DeepMind.

Toward that end, opensource software libraries for machine learning, like Google's TensorFlow are available online for free. GitHub, an online development repository, offers access to a wide range of other available machine learning libraries coders and programmers can use.

The physical hardware currently being used to build deep learning machines include GPUs (Graphics Processing Units) similar to those found in high-end gaming computers. Instructions are online on how to build deep learning machines, including information provided by companies like NVIDIA which make commercially available GPUs.

While it remains to be seen what individual or independent groups of developers can achieve in terms of democratizing this technology, it may be in the best interests of nation-states to begin developing their own AI programs rather than wait for Google, Facebook, and even China's Baidu to "share" this technology with them.

It may also be in their best interests to examine the merits of promoting the democratization of this technology. Where a lack of resources to acquire high-level researchers at an institutional level exists, democratizing and thus tapping a larger pool of talent to even the odds in the AI race while also raising public literacy regarding this increasingly pivotal technology may be an alternative option.

Research into AI cannot be "banned" and breakthroughs cannot be "un-invented." With the tools already widely (and in some cases, freely) available to advance AI, attempts to put this civilization-changing technology "back in the box" will only waste time and resources. The only way to counter the harmful application of AI is by possessing an equal or greater capacity to utilize the technology and increase the number of people both educated in how it works, and capable of applying it in reaction to harmful exploitation of it.

Just like information technology, nuclear weapons, or even firearms tilted the global balance of power in favor of those who initially wielded them before more acquired and exploited these technologies, AI too poses a threat unless and until it is more widely adopted and democratized.

With the power to focus on and master any task at superhuman levels, we ignore the challenge to balance this emerging power at our own peril.

LocalOrg seeks to explore local solutions to global problems by empowering people locally with education and technology to not only survive, but to thrive.
 

          Cloudworld: A Hegelian Theory of Complexity and Algorithmic Reality        
Philosophy could be an important conceptual resource in the determination of human-technology interactions for several reasons. First, philosophy concerns the topics of world, reality, self, society, aspirations, and meaning, all of which we are hoping to reconfigure and accentuate in our relations with technology. Improving human lives is after all one of the main purposes of technology. Second, philosophy relates to thinking, logic, reasoning, and being, which are the key properties of what we would like our technology entities to do. We would like our technology entities to be more like persons: pre-uncanny valley but fully-fledged tech others; thinker helpers, empathic listeners, coaches, optimizers; a new kind of technology-presenced companion. However, ensconced in recent computational advances, it has been neglected to look to thinking about thinking as a primary resource. Third, philosophy treats the grasping and naming of new things in the world, which is precisely helpful in the case of new and quickly-emerging technological realities.

Hegel could be a potentially helpful position in the consideration of the governance of emerging technologies. This is because the Hegelian reference point is specifically a moving dialogical expanding and not a pre-specified moment in response to unfolding situations. The Hegelian method involves triads: there is the thing itself, its negation, and a bigger third position that sublates the truth content out of the two previous positions into a new shape of its own consciousness. This kind of conceptual robustness could help in articulating more nuanced positions regarding emerging technologies and moving beyond stark binaries like ‘adopt-or-don’t adopt,’ technological dualism that ‘any technology has both good and evil uses,’ and a seemingly inevitable hopelessness in the face of existential risk.

The current situation of emerging technology is one of algorithmic reality. Not only are more new kinds of technology entities having a substantial presence in our human reality, where we are interacting with them on a regular basis, there is a sense of a quickening progression of these entities. There are drones, self-driving cars, personal home robots, quantified-self gadgets, Siri-commanded mobile phones, blockchain smart contract DACs, tradenets, deep-learning algorithms, big data clouds, brain-computer interfaces, neural hacking devices, augmented reality headsets, and deep-learning gaming worlds. Further, each of these technology classes is itself a platform, network, and app store, where the implication is cloudworld. Cloudworld is the notion of a deep multiplicity of networks as a compositional element of new algorithmic realities, where every network is a Turing-complete general computational substrate for every other. Any technology can immediately ‘grok,’ simulate, and run any other; the meaning of which from our human standpoint is vastly unclear. Derivatively, any sort of cloudmind (clustered interactions between multiple human minds or entities (e.g.; artificial intelligence) coordinated via the Internet cloud) might run on any platform.

A Hegelian theory of algorithmic reality is a complexity philosophy position, meaning that it has the properties of a complex adaptive system in being nonlinear, emergent, dynamic, open, unknowable, self-organizing, and interdependent. A complexity philosophy position is required to congruently correspond to the underlying reality which is itself complex. Algorithmic reality is not just an increasing degree of human-technology entity interaction but a multiplicity and proliferation of classes of network technology entities. The Hegelian position is exactly one that might constitute a bigger yes-and collaboration space that expansively accommodates all parties.

Inspiration: Minsky's legacy in the context of contemporary and near-future AI

          Machine Trust Language (MTL): Human-Machine Collaboration        
Andreas Antonopoulos’s articulation of network-enforced trust primitives (Oct 2015, Feb 2014) could be extended more broadly into the concept of Machine Trust Language (MTL). While blockchains are being popularly conceived as trust machines, and as a new mode of creating societal shared trust, Andreas addresses how at the compositional level, this trust is being generated. The key idea is thinking in terms of a language of trust, of its primitives, its quanta, its elemental pieces, its phonemes, words, and grammar that can be assembled into a computational trust system.

Blockchains are a network-centric trust system that can make and enforce promises. A network is not just a decentralized architecture; a network can have functional properties built into it. Network-centric or network-enforced functionality can thus enable a more complex level of activity. As XML standardized, facilitated, and undergirded Internet I: the Internet of information transfer, MTL could similarly for the Internet II: the Internet of value transfer.

Trust Primitives: Technical Details
The atomistic building blocks of trust, trust primitives, arise from blockchain scripting languages; they are the programming functions or opcodes used to specify different situations. Some examples are OP_CHECKSIG (a script opcode used to verify that a signature is valid) and OP_CHECKLOCKTIMEVERIFY (a script opcode used for a transaction output to be made unspendable until some point in the future).

As human language components are aggregated into different levels (phonemes, morphemes, lexemes, syntax, and context), so too can blockchain trust primitives. These indivisible blockchain trust particles, trust quanta, can be assembled into larger trust structures like payments. One example could be a micropayment channel with bidirectional settlement for vendor payment, for example entered in 1000 blocktime confirmations for 10 millibits. There could be libraries of standard trust primitives that are always included, for example, to verify the signature or multi-signature status of any transaction. The possibility of fine-grained trust primitives is limitless – a very small instruction set can be used as a toolkit for innovation that is composed into infinitely complex macro expressions. Some other examples Andreas mentions in addition to payment channels are stealth addresses, payment codes, and multisig escrows.

More sophisticated examples of in-built blockchain trust are already starting to become conceptual standards. One is Lighthouse, a cryptowallet that has crowdfunding (the ability to pledge funds to an address) as an incorporated feature; essentially a decentralized network Kickstarter program. The Kickstarter functionality is in the program (there is no custodian); just as Bitcoin allows digital currency transfers without a central bank, so too the Lighthouse wallet coordinates crowdfunding for projects without a central intermediary like Kickstarter. A whole series of similar network primitives with embedded trust functionality can be envisioned. These could include crowdfunding, reputation-checking, backfeeding (emergent collaboration), insurance, multisig, payment channels, peer-to-peer tipping (ProTip), compensation, remuneration, micropayments, IP tracking, backup (specified blockchain transaction record-keeping and archival), and advocacy (via third-party oracle like Smart Contract and Early Temple).

Trust as a Feature: Human-Machine Social Contracting
When trust becomes a ‘mere’ assumed included feature as opposed to a marveled at and explicitly designed functionality, we will have really arrived somewhere as a species. In some sense, the entire apparatus and infrastructure known as society has been produced to instill and manage trust. Deception had an evolutionary benefit, but is perhaps a quality that can be reconfigured, first in machine-mediated human interaction, and later in human biology. The longer-term endgame of blockchains-as-algorithmic-trust is human-machine collaboration, particularly in the application of shifting from the labor economy to the actualization economy. Given the increasing potential prevalence of machines in human existence, a looming topic is the kinds of social contracts that may be appropriate to establish between machines and humans. For example, consider what trust primitives might be needed to write a smart contract with your personalized home robot. To open a payment channel with your home robot, first could be identifying the relevant exchange streams for services and data. These might include personal data, life-logging, backup, diagnostics, advice, empathy, sound-boarding, home maintenance services, payments, and record-keeping; a list of operations that make sense to conduct in a ‘payment channel’ structure (e.g.; two-way open transfer over time of value between parties per triggering events).

A New Kind of Language
Here the concept would be considering the possibility space of all language and noticing that there could likely be a bigger range of language than has come into existence so far. There are human languages, computational languages, math, logic, and other systems of semantics and signifying. As seen with examples like math (Husserl), computing algorithms (Wolfram), intelligence (Yudkowsky), and self-assembled locomotion (Lipson) and life forms, what has been seen through the human example may be but a few nodes in a larger possibility space. The bigger query would be what new kinds of language can be made with blockchain trust primitives. Not just solving human problems (e.g.; creating automated trust structures) but creating new languages from these new functionalities. One next step could be applying linguistic theory (Chomsky, etc.), concept theory (Lakoff, Kant, etc.), and mathematics, logic, computation, complexity math, machine-learning, and deep-learning theory to creating platforms for the emergence of new kinds of language. The first task might be to optimize for obvious new types of trust language that might be possible and that might solve low-hanging fruit problems like offloading the cognitive and behavioral energy effort of deception to move to Brin’s Transparent Society. Blockchain trust could be for society what the quantified self fourth-person perspective was for the individual (a trustable independent objective arbitrator of information about reality).

Philosophy: A New Kind of Qualitative Language
A language of trust is undeniably qualitative. Trust is exactly the qualitative easing necessary for society to function, including in more intensive human-machine collaborations, and in larger scale universally-global and extraterrestrial singularity-class endeavors. Is it possible to reach a place with computational language to say what cannot be said with human language? Perhaps not in traditional 1s/0s computational language, but with a new kind of language of qualitative trust primitives, maybe yes. Wittgenstein famously said (the type of) all there is that can be said in the Tractatus, and in this crystallization pointed to what cannot be said, in three domains, ethics, aesthetics, and religion. Now thinking in terms of trust primitives and other qualitative primitives changes the question of what kinds of sentences and language can be written; the grammar and Wittgensteinian language games that can be enacted with blockchains; in an AI DAC and other applications. There could be many diverse blockchain cliometrics implementations in MTL; e.g.; the measurement of social qualitative factors like the amount of liberty in a political system. The notion is qualitative primitives and qualitative machine language; having a pourable bag of trust elements as components. There are trust primitives, and possibly many other kinds of qualitative primitives, for example freedom, autonomy, and choice primitives; idea primitives and innovation primitives; all of these could be on tap in a multi-faceted qualitative machine language to configure a life of crypto enlightenment

          VR Chains and DAC Brains: Upload your mind as a VR AI DAC        
Blockchain thinkers or DAC Brains are the notion of having DAO/DAC entities running with smart contracts on blockchains for the purpose of conducting thinking operations. The genesis of blockchain thinkers could be organic or inorganic: human mindfile lifelogs and uploads, and any variety of brain emulations and AI ML/DL algorithms (artificial intelligence machine-learning deep-learning algorithms). One idea is to instantiate your mindfile on the blockchain as a lifelogging tracker and standalone ideation tool: your own mind as an AI DAC. Some key enablers are coming together to make personal AI DACs possible. Idea chains (lifelogging your ideas onto a blockchain) could auto-record your ideas through 1) QS (quantified self)-attached gamma wave spike tracking (recording when you are having an idea), together with 2) cortical image recognition and thought identification (what the idea is about), logged into in a 3) personalized blockchain-based VR consensus reality (coordinating ideas into your own ongoing reality view).

Immersive Virtual Reality is Digitized Experience
Immersive VR (virtual reality), like with the Oculus Rift, is not just video games, virtual worlds, or 3-D CAVE environments, it is digitized experience. Qualitatively different, immersive virtual reality is a means of making physical world experiences real in an alternative medium. VR metaverses then, are parallel realities, as distinct from multiple digital worlds. If you and I go into WoW (World of Warcraft) or SL (Second Life) separately, we see the same world. Even if different levels of views are enabled or locked (like Karl Schroeder’s tech locks in Lady of Mazes [1]), they are just different lenses on the same world. However, if you and I construct our own digital worlds, we see and create different worlds, possibly on the same basic platform, but the realities can be fundamentally different, with different participants, events, and historical records.

Reality Unity in the Physical World
Consider the physical world - there is one platform, and we each have varying reality maps or views of the physical reality platform in our heads. There is one consensus reality and historical event record, and conflicts arise out of different views of the consensus reality trying to hew to one (e.g.; “What happened? X punched Y first. No, Y shoved X first.” – we seek a unique consensus reality of events (Probability Moon further explores the notion of societal shared reality)). Centralized virtual worlds have been the same; there is one reality platform, and centralized event engines record the consensus in one shared events ledger, the game history, even in OpenGL self-hosted models. Now, however, with decentralized models powered by blockchains and dapps, DAOs, and DACs, reality multiplicity is possible. There can be simultaneously existing parallel realities. The multiverse exists, and one place it can be created is in cyberspace.

Blockchains enable Simultaneous Multiple Realities
Just as blockchains are the critical enabling technology for digital cryptocurrencies, so too are they a key facilitator of VR multiverses. Blockchains could serve as the backbone infrastructure for multiple parallel realities (VR multiverses) by coordinating the chain of event histories in these multiple realities. The transaction history is not just for transactions, but more broadly comprises the historical event record. Blockchains consensus-generate the historical record, and allow any number of separate and parallel historical records to be created simultaneously. Blockchains are the mechanism for creating and coordinating simultaneous multiple realities. The altcoin space is already an example of simultaneous separate realities.

The Selectability of all Reality FeaturesBlockchains consensus-generate the historical record, and further, make it clear that all parameters of reality can be malleable and selectable: time, participation, reputation, memory, history (historicity), economic models (hierarchical or peer-based), and political operations (governance and decision-making). These are all selectable parameters of a reality environment. One recent revolution in economic liberation sensibility is that blockchains allow individuals and communities to self-determine economic systems. Now seen in the VR multiverse context, blockchains are revealed to be much more: they could enable all parameters of a reality environment to be selected.

Blocktime Time Malleability
One example of reality feature selectability is blocktime. The timeclock in blockchains is blocktime, the time it takes for blocks of transactions to confirm. The easiest way to specify future time moments (t+n) is via the internal time system of the blockchain, blocktime. For example, the term of a certain decentralized dapp loan might be 7000 block confirmations. Blocktime is the clocktime of blockchains. Certainly blocktime converts to physical world time, but differentials could arise and give way to arbitrage opportunities or other divergence-as-a-feature possibilities. The key point is that all reality parameters, including time and space, could become malleable in blockchains and especially in blockchain-coordinated VR metaverses. Further, if blockchains become the mechanism for keeping time and event histories, de facto they become memory, where memory is a critical functionality that feeds back directly into lifelogging and Brain-as-a-DAC idea chains.

A World of Multiple Realities
All of reality can be made malleable, personalized, self-determined, personally-constructed, emergent, and a thing of multiplicity not monolithicity. There can be an end to the tyranny of a sole reality. “End reality tyranny, create your own VR multiverse!” Deleuze's multiple inner views can bloom as described in Proust and Signs. In the new sensibility of VR multiverse reality multiplicity, an imaginable query to alien intelligence could be a Kardashev scale parameter: “To what extent do you have multiple realities in your world?”

Right to Self-Determine One’s Own Realities
The earlier positions in human liberation have been the right to self-determination in certain contexts, in different parts of life and the experience of reality. These include the right to self-determination in governance, legal systems, IP protection/sharing regimes, software business models, neural data privacy rights, cognitive enhancement, and most recently, the emerging sensibility of the right to self-determine one’s own economic systems. These are all important steps in the liberty of the individual, but they are all in some sense intermediary positions on the way to the now-visible bigger position which is the right to self-determine one’s own overall reality, and really, realities (plural). A new sensibility could be seeing the right of each individual, entity (human and machine/technology entities), or group to self-define its own personal consensus reality (realities). The central component of the self-determination of organisms could be the operation of its own consensus reality(ies).

Blockchains as a Historicity Mechanism and Collaboration Space 
Blockchains are a means for consensus-generating the historical record (a historicity mechanism) to facilitate reality multiplicity, and they are the means of enabling value flow. In network economic theory, this is beyond the transactional sense of the value flow of currency from me to you, where unleashing the creation and transmission of many kinds of non-monetary value flows is the bigger picture of what is at stake and possible in creating multiple realities. Non-monetary currencies (like universal human needs for connection, contribution, mattering, and understanding) can be registered and tracked as blockchain-based smart assets. One reason for VR realities, what we are really wanting in creating new realities (via VR multiverses) is creating spaces that are free of the limiting constraints of physical realities. These constraints pertain to both the physical world and human limitations, including matter, gravity, time, illness, disability, impairment, sleep, recovery, distraction, cognitive bias, etc.) such that more freedom, exploration, collaboration, expression, creativity, fun, serendipity, progress, and contribution can be enabled. We want to cognitively enhance proximately for a better memory, sure, but ultimately to be 'bigger' in the sense of being more able to grow and participate beyond our initial position of self. We want more of the creative yes-and collaboration space of new energy and idea generation. The ‘economy’ of the future might be measured based on non-monetary value flows like ideation, which could be orchestrated by public and private reality blockchains.

Convergent Integration of Multiple Simultaneous Realities
Now possibly having a situation of multiple simultaneous realities, what is there to do with them? There are several implications for the future of privacy, sharing, and collaboration. For example, there is a question about when and how to cohere and merge VR DAC brain realities. Therefore, within realities, there might be sub-threads or other means of parsing and segmenting sub-realities.
Colored coin threads in your brain DAC could be the way to permission subreddit mind ledgers to cloudmind collaborations
Mindchains could be a means for how to safely mindshare or collaborate in a cloudmind, for example by permissioning your subreddit ledger for ideation related to certain areas as opposed to your full mindfile or meat-brain….“here, let me share everything with you I’ve thought about crowdsourced genomic studies,” or "here, join the mindslack channel for this community."

Blockchain apps could auto-merge shared realities in the way that topical queries are ambiently processed in the background now. There could be situations analogous to Hayek’s competitive currencies where reality views compete. There could be reality ecologies where repetitive threads across individual realities converge into shared group realities (the unobtrusively representative politics of the future). Right now this happens manually with the blunt tools of the physical world; we search for other individuals, groups, and institutions with our own shared values and reality view, and blockchain DACs might facilitate the automatic canvassing and convergence of all of this.

We might know that VR metaverses and the human-machine collaboration are really working when VR NPC DACs self-create in our realities per sensing our human needs (actualization, contribution, growth and learning, exploration, creation). Blockchain-based VR AI DACs could auto-sense and create whatever 'Tuscany houses' are needed to grow an entity (like a human or machine mind) in its progression. For example, in an ideas 'economy,' the most important inputs are anything which facilitates the development of ideas, and attending to this could be one purpose of a an NPC VR AI DAC in your personal VR metaverse, operating via smart contracts on your mindchain. Ideas are the demurrage-redistributable basic income of a blockchain thinker Brain DAC. Blockchain thinker Brain DACs then become another Ubiquitous Grid Resource, an important one, for idea generation, in the overall picture of the future Network Economies of Abundance.

Acknowledgement: This post was inspired by ideas from Maciej Olpinski regarding consensus in virtual reality worlds.

[1] POV HUDs are a mechanism to accommodate multiple levels of technology adoption within a society; e.g.; through my HUD, I see unimproved nature and birds tweeting; through your HUD, you see propositional nanotech 3-D printed finery self-mutating in utility fogs.

          Deep Learning Artificial Intelligence Part 2        
Deep Learning Artificial Intelligence. The newest surge in AI design has companies and investors in a scramble to get the best. But do veterans of the field agree with the surge in popularity? Yoshua Bengio, Full Professor of the Department of Computer Science & Operations Research at the University of Montreal, definitely thinks deep learning … Continue reading
          Artificial Intelligence and Deep Learning        
My name is Michael Wozniak. I am a programming / game development teacher at Thinnox. The most exciting and flourishing area in technology right now in my opinion is Artificial Intelligence and Deep Learning. Everything from the newest self-driving cars to the numerous Twitter bot experiments, all use artificial intelligence strategies called Deep Learning. Deep … Continue reading
          æ·±åº¦å­¦ä¹ å®žéªŒå®¤å’Œç ”究组        
http://deeplearning.net/deep-learning-research-groups-and-labs/ Deep Learning Research Groups Some labs and research groups that are actively working on deep learning:University of Toronto - Machine Learning Group (Geoffrey Hinton, Rich Zemel, Rusla .
          SICK AG: Entwicklungsingenieur Software - Algorithmik (m/w) Machine Learning/3D-Grafik/Datenfusion        
Design, Umsetzung und Anwendung neuartiger Machine-Learning- und Deep-Learning-Algorithmen
          Comment on I spy with my computer vision eye… Wally? Waldo? by Rick Searle        
Here is the NY Times article I mentioned: http://www.nytimes.com/2012/11/24/science/scientists-see-advances-in-deep-learning-a-part-of-artificial-intelligence.html?_r=0 And s skeptical rejoinder: http://www.newyorker.com/online/blogs/newsdesk/2012/11/is-deep-learning-a-revolution-in-artificial-intelligence.html What's your take?
          Easier, faster: The next steps for deep learning        

If there is one subset of machine learning that spurs the most excitement, that seems most like the intelligence in artificial intelligence, it’s deep learning. Deep learning frameworks—aka deep neural networks—power complex pattern-recognition systems that provide everything from automated language translation to image identification.

Deep learning holds enormous promise for analyzing unstructured data. There are just three problems: It’s hard to do, it requires large amounts of data, and it uses lots of processing power. Naturally, great minds are at work to overcome these challenges.  

What’s now brewing in this space isn’t just a clash of supremacy between competing deep learning frameworks, such as Google’s TensorFlow versus projects like Baidu’s Paddle. Rivalry between multiple software frameworks is a given in most any part of IT.

To read this article in full or to leave a comment, please click here


          What deep learning really means        

Perhaps the most positive technical theme of 2016 was the long-delayed triumph of artificial intelligence, machine learning, and in particular deep learning. In this article we'll discuss what that means and how you might make use of deep learning yourself.

Perhaps you noticed in the fall of 2016 that Google Translate suddenly went from producing, on the average, word salad with a vague connection to the original language to emitting polished, coherent sentences more often than not -- at least for supported language pairs, such as English-French, English-Chinese, and English-Japanese. That dramatic improvement was the result of a nine-month concerted effort by the Google Brain and Google Translate teams to revamp Translate from using its old phrase-based statistical machine translation algorithms to working with a neural network trained with deep learning and word embeddings employing Google's TensorFlow framework.

To read this article in full or to leave a comment, please click here


          Comment on on sharing work in progress and anonymity by Chris Bourg        
This is terrific - thanks for it. I will offer one piece of somewhat self-interested data in response to this: "And what about books? Publishers cannot afford to print materials that people can get for free on the Internet." The MIT Press, oversight of which is part of my purview as Director of MIT Libraries, published Deep Learning (https://mitpress.mit.edu/books/deep-learning) as free online version and a reasonably priced print version, and the print version is a best seller. OA online and reasonably priced print versions can co-exist.
          A Beginner’s Guide to Deep Learning        

Byte Academy

As long as there has been Science Fiction, the idea that machines are smarter than humans has always captivated our collective imagination. But while Artificial Intelligence (AI) has not reached that level yet, we have made considerable advances towards developing machine intelligence, as evidenced by Google, Tesla, and Uber experimenting with self-driven cars. The reason why AI […]

The post A Beginner’s Guide to Deep Learning appeared first on Byte Academy.


          The Machine Learning and Artificial Intelligence Bundle for $39        
Learn the Mathematics & Algorithms Behind the Next Great Tech Frontier with These 11 Instructive Hours
Expires December 16, 2021 23:59 PST
Buy now and get 91% off

Easy Natural Language Processing (NLP) in Python


KEY FEATURES

Over this course you will build multiple practical systems using natural language processing (NLP), the branch of machine learning and data science that deals with text and speech. You'll start with a background on NLP before diving in, building a spam detector and a model for sentiment analysis in Python. Learning how to build these practical tools will give you an excellent window into the mechanisms that drive machine learning.

  • Access 19 lectures & 2 hours of content 24/7
  • Build a spam detector & sentiment analysis model that may be used to predict the stock market
  • Learn practical tools & techniques like the natural language toolkit library & latent semantic analysis
  • Create an article spinner from scratch that can be used as an SEO tool
Think this is cool? Check out the other bundles in this series, The Deep Learning and Artificial Intelligence Introductory Bundle, and The Advanced Guide to Deep Learning and Artificial Intelligence.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of Python and Numpy coding is expected
  • All code for this course is available for download here, in the directory nlp_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Unsupervised Machine Learning Hidden Markov Models in Python


KEY FEATURES

Data, in many forms, is presented in sequences: stock prices, language, credit scoring, etc. Being able to analyze them, therefore, is of invaluable importance. In this course you'll learn a machine learning algorithm - the Hidden Markov Model - to model sequences effectively. You'll also delve deeper into the many practical applications of Markov Models and Hidden Markov Models.

  • Access 40 lectures & 4.5 hours of content 24/7
  • Use gradient descent to solve for the optimal parameters of a Hidden Markov Model
  • Learn how to work w/ sequences in Theano
  • Calculate models of sickness & health
  • Analyze how people interact w/ a website using Markov Models
  • Explore Google's PageRank algorithm
  • Generate images & discuss smartphone autosuggestions using HMMs
Think this is cool? Check out the other bundles in this series, The Deep Learning and Artificial Intelligence Introductory Bundle, and The Advanced Guide to Deep Learning and Artificial Intelligence.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of Python and Numpy coding is expected
  • All code for this course is available for download here, in the directory hmm_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Cluster Analysis and Unsupervised Machine Learning in Python


KEY FEATURES

Cluster analysis is a staple of unsupervised machine learning and data science, used extensively for data mining and big data because it automatically finds patterns in data. The real-world applications for this process, then, are vital, making people who can implement cluster analyses a hot commodity in the business world. In this course, you'll become a master of clustering.

  • Access 22 lectures & 1.5 hours of content 24/7
  • Discuss k-means clustering & hierarchical clustering
  • Explore Gaussian mixture models & kernel density estimation
  • Create your own labels on clusters
Think this is cool? Check out the other bundles in this series, The Deep Learning and Artificial Intelligence Introductory Bundle, and The Advanced Guide to Deep Learning and Artificial Intelligence.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of Python and Numpy coding is expected
  • All code for this course is available for download here, in the directory unsupervised_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

Data Science: Supervised Machine Learning in Python


KEY FEATURES

Machine learning is entering the scientific mainstream faster than ever, being utilized to do tasks as diverse as analyzing medical images, driving cars automatically, and everything in between. Google has even announced that machine learning is one of their top focuses of innovation, making it an invaluable subject to begin studying now. In this course, you'll dive into the basics of machine learning, the theory behind it, and its many practical applications so you can be on the forefront of a new technological wave.

  • Access 33 lectures & 3 hours of content 24/7
  • Discuss the K-Nearest Neighbor algorithm, its concepts, & implement it in code
  • Explore the Naive Bayes Classifier & General Bayes Classifier
  • Learn about Decision Trees
  • Dive into the Perceptron algorithm, the ancestor of neural networks & deep learning
  • Understand more practical machine learning topics like hyperparameters, cross-validation, feature extraction, feature selection, & multiclass classification
Think this is cool? Check out the other bundles in this series, The Deep Learning and Artificial Intelligence Introductory Bundle, and The Advanced Guide to Deep Learning and Artificial Intelligence.

PRODUCT SPECS

Details & Requirements

  • Length of time users can access this course: lifetime
  • Access options: web streaming, mobile streaming
  • Certification of completion not included
  • Redemption deadline: redeem your code within 30 days of purchase
  • Experience level required: all levels, but knowledge of Python and Numpy coding is expected
  • All code for this course is available for download here, in the directory unsupervised_class

Compatibility

  • Internet required

THE EXPERT

The Lazy Programmer is a data scientist, big data engineer, and full stack software engineer. For his master's thesis he worked on brain-computer interfaces using machine learning. These assist non-verbal and non-mobile persons to communicate with their family and caregivers.

He has worked in online advertising and digital media as both a data scientist and big data engineer, and built various high-throughput web services around said data. He has created new big data pipelines using Hadoop/Pig/MapReduce, and created machine learning models to predict click-through rate, news feed recommender systems using linear regression, Bayesian Bandits, and collaborative filtering and validated the results using A/B testing.

He has taught undergraduate and graduate students in data science, statistics, machine learning, algorithms, calculus, computer graphics, and physics for students attending universities such as Columbia University, NYU, Humber College, and The New School.

Multiple businesses have benefitted from his web programming expertise. He does all the backend (server), frontend (HTML/JS/CSS), and operations/deployment work. Some of the technologies he has used are: Python, Ruby/Rails, PHP, Bootstrap, jQuery (Javascript), Backbone, and Angular. For storage/databases he has used MySQL, Postgres, Redis, MongoDB, and more.

          The Universal Basic Share        
Universal basic income, or UBI for those acronymically minded, is in the news these days, along with other brilliant post-modern inventions such as Brexit or Trump. Unlike these other luminaries, though, UBI is a genuinely cool idea: give everyone a basic amount to spend, and let them do what they will with it. They could write poetry, compose sonatas, or study number theory. They could work for more income if they wanted. Or they could relax and do absolutely nothing.

UBI is the offspring of a beautiful dream: the liberation of the human being from the drudgery of everyday labor. But it is also the product of a scary thought: the trend of ever-advancing automation, now accelerated many-fold by new deep learning algorithms.

You see the connection, of course. If a bunch of creepy robots are going to pass the Turing test at the call center, or drive up shinily when you hail an Über, or stack boxes even as they are energetically prevented from doing so, or even dance while doing the dishes, you'd better find something better to do with your labor time.

So UBI is a nice gesture; it sends you on your way with a little stash of income that you can do with as you please. But of course, a little stash multiplied by the population ends up -- not surprisingly -- in a big stash. Try giving everyone in the United States $10,000 each annually, and you will see that the required payout comes to a cool 3 trillion per year, which is in excess of three-quarters the annual federal budget. Yikes!

Nevertheless, the idea has found serious purchase in Europe: the Swiss even voted in June on UBI of around $2500 per month per adult. It didn't pass --- in fact it was turned down by a large margin -- but a serious warning shot had been fired. Finland and the Netherlands are planning to trial UBI by following a group of lucky recipients around and seeing what they do with their monthly payments. (Though somehow the thought of being stalked by a group of randomistas asking how I am spending my UBI is weird; I know what I'd tell them.)

You would think that the UBI is a good idea for rich countries. But there is also a prima facie case for trying it in a country like India, which one way or the other has been making very large transfers for decades.  Just the public distribution scheme for foodgrain represents a subsidy of around 1.4% of GDP, but if you add to this the subsidies on fertilizer, transportation, water, electricity and other goods, we are up to well over 4% of GDP. Then there are the so-called "revenues foregone" through various exemptions, chiefly via relief on excise and customs duty, that will take you into the region of another 6% of GDP. We're now up to 10% and counting, and we're counting because these are just in the domain of the Central Government; there are more subsidies at the State level, and there are other implicit subsidies via sub-market pricing of public sector goods. (See also Santosh Dash's comment below.)  I'm not counting large sources of social expenditure, such as education and health, nor the national rural employment guarantee scheme,  which provides every rural household the right to 100 days of work at a basic wage. Here's an illuminating note on central Indian subsidies put together by Siddharth Hari, a doctoral candidate at NYU.

These subsidies are often greatly lamented, largely on the right, by individuals who blame them for all sorts of bad outcomes. One favorite lament is that there are big leakages due to corruption. Another is that subsidies are often mis-targeted (over and above the corruption) to the non-poor. And the libertarian spirit typically completes this tri-headed litany: why should the Government tell us what to eat, or how many health checkups to have? And what is it doing in the food distribution or transportation business, or in any business for that matter? Why not just hand out plain unvarnished -- and presumably untarnished -- cash instead to everyone, and be done with it?

I want to refrain from engaging in that debate here, but the bottom line is this: talk of a universal cash transfer that replaces a system of multifarious, nefarious transfers has long been in the Indian air.  So it comes as no particular surprise to learn that careful, long-standing observers of the Indian economy have promptly added two and two to ask: can we cobble together a basic, unconditional, universal income for all of India's citizens?

You might justifiably and indignantly ask: unconditional and universal? Why should the rich also be treated to free income? Answer: try targeting, and the leaky bucket will emerge again, spilling copiously. But isn't a universal transfer the logical equivalent of a bucket with basically no bottom at all? Perhaps, but then we'd spend all our time issuing and examining BPL cards, and given the massive corruption and incompetence in the bureaucracy, you may as give everyone the money and save us the headache. Well, ok, but I don't feel like providing Ambani with an assured UBI. Oh, we can get around a lot of that by requiring that the claimant must show up in person bearing an identity card to claim her income. Ambani won't show up in person. A lot of the rich won't show up. But who's to say that the bureaucrat won't claim that they did show up? And so on and so forth. Or you could attack all of this from another direction: won't the poor squander their cash cavorting and drinking, as the poor apparently do? Then the usual arguments about paternalism can start up. There is no end to this.

In this post, I am going to tentatively accept the idea of universality and non-paternalism, and look at the other elephants in the room (alas, there's a veritable herd of them):

1. The promise of an UBI can be inflated away. Who's going to make sure this thing is properly indexed to rising prices, and what if it's not? An unsympathetic Government can erode all the promises --- all the subsidies and the transfers that were so clumsily but irrevocably made in kind --- and make them vanish into thin air in a matter of years. (With inflation at 5%, a nominal commitment in fixed rupees with halve -- in real value -- in 14 years.)

2. The commitment looks really huge (sorry, I seem to have inadvertently quoted Trump).   In 2014, the Rangarajan Committee submitted its report proposing a monthly poverty line of Rs. 972 and Rs. 1407 (urban). With rural population shares taken into account, that's a bit north of Rs. 13,000 ($200) per year per person. A pittance? Yes. But multiply by India's population of 1.25 billion and you're at around 12% of India's GDP ($2.09 trillion in 2015).  If you want to cut that back to Rs 10,000 per year (or around $150), you're at 9% of GDP. So there you have it, ladies and gentlemen: 9-12% of GDP to bring every man, woman and child up to speed, or at least walking pace.

Is this do-able? It all depends on whether those huge subsidies to the non-poor can be removed. Pranab Bardhan writes:

"[T]he Indian government doles out significantly more than [10% of GDP] in implicit or explicit subsidies to better-off sections of the population, not to mention tax exemptions to the corporate sector. By discontinuing some or all of these subsidies – which, of course, do not include expenditures in areas like health, education, nutrition, rural and urban development programs, and environmental protection – the government could secure the funds to offer everyone, rich and poor, a reasonable basic income."

There's some more optimism expressed by Abhijit Banerjee and by Guy Standing, but the political economy of subsidy removal does look menacing, to say the least. Central government expenditure as a share of GDP has been declining since 2010; this year it will be a bit more than 13%. That matches the demands that UBI would make, which isn't comforting at all. Nor is it comforting that no one pays taxes in India. In a more pessimistic piece, Maitreesh Ghatak concludes that:

"A universal cash transfer scheme is therefore not feasible without raising additional taxes. Not just that, given that only 1 per cent of Indians actually pay income tax, while a mere 2.3 per cent file tax returns, the fiscal instruments to claw back the transfer from the rich do not exist."

It does seem like we're on a dramatic edge here, and a lot must hand on whether existing subsidies can be credibly removed.

3. For my last elephant, let's go back for a moment to this whole automation business. Some years ago, I observed in this post  (a tad gloomily) that:

"to avoid the ever widening capital-labor inequality as we lurch towards an automated world, all its inhabitants must ultimately own shares of physical capital. Whether this can successfully happen or not is an open question. I am pessimistic, but the deepest of all long-run policy implications lies in pondering this question."

I've italicized the phrase I want to emphasize here: if we're truly headed towards automation, it is not enough to pay out UBI and let a small group of residual claimants eagerly divvy up the remaining surplus. Even with indexation to inflation, the UBI is a fixed commitment. What happens, then, as profits continue to rise in business? Is no share to be passed on to the population? Will class warfare be reduced to annual debates about how to adjust the UBI?

In the rest of this article, I'm going to propose a simple amendment of UBI that holds out serious hope for dealing with all of these issues and more. I'm going to call it the universal basic share, or UBS. Simply put, the UBS is a commitment that is expressed, not as a sum of money, but as a share. Specifically, I propose that we commit a fixed fraction of our GDP to the provision of a universal income for all.

Consider six merits of this proposal, not necessarily in order of importance:

A. It is country-neutral. It can be introduced into every country, rich or poor. It scales up or down with country-level income.

B. We can start small.  In the Indian example, the numbers do not have to be at Rs. 10,000 to begin with. But over time, they will get there. In this sense, the proposal takes (some) care of the debate that we "cannot afford it."

C. The UBI commits a government to pay out a fixed sum, come hell or high water. In contrast, UBS insulates against shocks to the fiscal system that are correlated with GDP shocks.  (Given the amounts involved, one might imagine even rich governments being risk-averse.) But the upside to the general public will be enormous.

D. The UBS does not need to be indexed at all. It's fixed as a share of nominal GDP,  and that will automatically take care of any indexing that's needed.

[Update 1: A UBI can be indexed in India using the dearness allowance, which is a cost-of-living adjustment based on the cost-of-living index and paid out to public-sector employees and pensioners. Maybe, though in countries where inflation statistics are dodgy I'd be wary of this. I'd be wary of formula manipulations in India as well, once a truly enormous commitment such as UBI is on the table. In any case, I am after more than mere indexation; see point F below.]

E. The UBS will create an incentive for a majority to demand a better tax collection and auditing system. And the government, too, would be incentivized to close off its tax loopholes. For India, this is a first-order issue.

F. The UBS allows everyone to share in the prosperity of a country.  To me, this aspect of equity-sharing is --- in the longer run --- the most important feature of the UBS. It is our protection against unbounded inequality as we move into an increasingly automated universe.

To implement a UBS, the most important thing is to get the share right. Giving everyone Rs. 10,000 per year takes us to about 9% of GDP. But it's not enough to leave it there; we need a sense of what this looks like as a fraction of government expenditure. This is an extremely tricky business. Let me illustrate with India, which --- given its existing slew of explicit and implicit subsidies --- is possibly one of the most difficult examples out there. (Fair warning:  I have the back of an envelope out as I speak, so the numbers below would need to be refined.) 

The central government's expenditure share as a percentage of GDP is a bit shy of 14% in 2014-2015. But central and state expenditure combined is double that: around 27% in 2014-2015 (here for the gory details). For revenue foregone and other implicit subsidies, which we would need to take back, add on another 6-10%. That gets us to about 35%. So to access 9% of GDP as UBS, we would need to contribute 25% of government expenditure, inclusive of all subsidies, to the cause.

[Update 2: an alternative is to commit UBS directly as a share of government expenditure, which is the form in which I originally suggested it. The linking of basic income to overall prosperity then is less direct. Moreover, as Pranab Bardhan, Karna Basu and Siddharth Hari have pointed out to me,  the government could suffer from possible disincentives in raising expenditure, fearing that part of the increase would be "taxed off" by the UBS. Though in view of the deficit, such fears might be a blessing in disguise. On the other hand, the government would gain better insurance: if there is a sudden fiscal crisis, even one that's independent of a GDP shock, its commitment to UBS would adjust accordingly.]

Can we really usher in the right to a UBS? I have no clue whether we have the political will to pull something like this off. But remember: it's a share that's being committed. At Indian rates of growth and with an improving fiscal system, we can get the resulting numbers to double in 10-12 years, and double again a decade after that. So if we want to start smaller, we can entertain that thought.

Some postscripts on the UBS:

(i) If you want to institute a share, do it when you start the program. Once a number is fixed, no one wants to move towards a share as it looks risky. With a share to begin with --- where there was nothing before --- matters can be very different.

(ii)  For each year, the payout assessment will need to be done. This can be done using the previous year's GDP (or expenditure, in case the variant is tried) and dividing by population estimates. Uncollected payouts --- and hopefully there will be a lot of those --- can go into an insurance endowment or otherwise used.

(iii) [Update 3.] After I wrote this, Rajiv Sethi pointed me to Robert Shiller's proposal to issue trills, which is a government-issued security that would pay a share -- in trilllionths, hence "trill" --- of GDP. Yes! A UBS is certainly viewable as a variant of a gigantic, collectively held trill --- a plain bill, then, perhaps? Look here for a related proposal by Rajiv to hold individual bank accounts at the Fed. In keeping with the adage that there's nothing new under the sun, Ugo Colombino pointed out the connection to the citizen's dividend, which is a form of UBS based on natural resources; Alaska implements a form of this as the Alaska Permanent Fund. In the words of Thomas Paine, "men did not make the earth." Rahul Basu told me about the efforts --- inspired by Alaska's fund--- to secure a permanent fund in Goa. Read here about such a movement, and read here about the Supreme Court directive to create a Goa Iron Ore Permanent Fund.

Hey Switzerland, want to try again?

Thanks: Pranab Bardhan, Karna Basu, Rahul Basu, Ugo Colombino, Parikshit Ghosh, Siddharth Hari, Aditya Kuvalekar, and  Rajiv Sethi.











          Bottom-up and top-down in drug discovery        
There are two approaches to discovering new drugs. In one approach drugs fall in your lap from the sky. In the other you scoop them up from the ocean. Let’s call the first the top-down approach and the second the bottom-up approach.

The bottom-up approach assumes that you can discover drugs by thinking hard about them, by understanding what makes them tick at the molecular level, by deconstructing the dance of atoms orchestrating their interactions with the human body. The top-down approach assumes that you can discover drugs by looking at their effects on biological systems, by gathering enough data about them without understanding their inner lives, by generating numbers through trial and error, by listening to what those numbers are whispering in your ear.

To a large extent, the bottom-up approach assumes knowledge while the top-down approach assumes ignorance. Since human beings have been ignorant for most of their history, for most of the recorded history of drug discovery they have pursued the top-down approach. When you don't know what works, you try things out randomly. The Central Americans found out by accident that chewing the bark of the Cinchona plant relieved them of the afflictions of malaria. Through the Middle Ages and beyond, people who called themselves physicians prescribed a witches' brew of substances ranging from sulfur to mercury to arsenic to try to cure a corresponding witches' brew of maladies, from consumption to the common cold. More often than not these substances killed patients as readily as the diseases themselves.

The top-down approach may seem crude and primitive, and it was primitive, but it worked surprisingly well. For the longest time it was exemplified by the ancient medical systems of China and India – one of these systems delivered an antimalarial medicine that helped its discoverer bag a Nobel Prize for Medicine. Through fits and starts, scores of failures and a few solid successes, the ancients discovered many treatments that were often lost to the dust of ages. But the philosophy endured. It endured right up to the early 20th century when the German physician Paul Ehrlich tested 604 chemical compounds - products of the burgeoning dye industry pioneered by the Germans - and found that compound 606 worked against syphilis. Syphilis was a disease that so bedeviled people since medieval times that it was often a default diagnosis of death, and cures were desperately needed. Ehrlich's 606 was arsenic-based, unstable and had severe side effects, but the state of medicine was such back then that anything was regarded as a significant improvement over the previous mercury-based compounds.

It was with Ehrlich's discovery that drug discovery started to transition to a more bottom-up discipline, systematically trying to make and test chemical compounds and understand how they worked at the molecular level. But it still took decades before the approach bore fruition. For that we had to await a nexus of great and concomitant advances in theoretical and synthetic organic chemistry, spectroscopy and cell and molecular biology. These advances helped us figure out the structure of druglike organic molecules, they revealed the momentous fact that drugs work by binding to specific target proteins, and they also allowed us to produce these proteins in useful quantity and uncover their structures. Finally at the beginning of the 80s, we thought we had enough understanding of chemistry to design drugs by bottom-up approaches, "rationally", as if everything that had gone on before was simply the product of random flashes of unstructured thought. The advent of personal computers (Apple and Microsoft had launched in the late 70s) and their immense potential left people convinced that it was only a matter of time before drugs were "designed with computers". What the revolution probably found inconvenient to discuss much was that it was the top-down analysis which had preceded it that had produced some very good medicines, from penicillin to thorazine.

Thus began the era of structure-based drug design which tries to design drugs atom by atom from scratch by knowing the protein glove in which these delicate molecular fingers fit. The big assumption is that the hand that fits the glove can deliver the knockout punch to a disease largely on its own. An explosion of scientific knowledge, startups, venture capital funding and interest from Wall Street fueled those heady times, with the upbeat understanding that once we understood the physics of drug binding well and had access to more computing power, we would be on our way to designing drugs more efficiently. Barry Werth's book "The Billion-Dollar Molecule" captured this zeitgeist well; the book is actually quite valuable since it's a rare as-it-happens study and not a more typical retrospective one, and therefore displays the same breathless and naive enthusiasm as its subjects.

And yet, 30 years after the prophecy was enunciated in great detail and to great fanfare, where are we? First, the good news. The bottom-up approach did yield great dividends - most notably in the field of HIV protease inhibitor drugs against AIDS. I actually believe that this contribution from the pharmaceutical industry is one of the greatest public services that capitalism has performed for humanity. Important drugs for lowering blood pressure and controlling heartburn were also the beneficiaries of top-down thinking. 

The bad news is that the paradigm fell short of the wild expectations that we had from it. Significantly short in fact. And the reason is what it always has been in the annals of human technological failure: ignorance. Human beings simply don't know enough about perturbing a biological system with a small organic molecule. Biological systems are emergent and non-linear, and we simply don't understand how simple inputs result in complex outputs. Ignorance was compounded with hubris in this case. We thought that once we understood how a molecule binds to a particular protein and optimized this binding, we had a drug. But what we had was simply a molecule that bound better to that protein; we still worked on the assumption that that protein was somehow critical for a disease. Also, a molecule that binds well to a protein has to overcome enormous other hurdles of oral bioavailability and safety before it can be called a drug. So even if - and that's a big if - we understood the physics of drug-protein binding well, we still wouldn't be any closer to a drug, because designing a drug involves understanding its interactions with an entire biological system and not just with one or two proteins.

In reality, diseases like cancer manifest themselves through subtle effects on a host of physiological systems involving dozens if not hundreds of proteins. Cancer especially is a wily disease because it activates cells for uncontrolled growth through multiple pathways. Even if one or two proteins were the primary drivers of this process, simply designing a molecule to block their actions would be too simplistic and reductionist. Ideally we would need to block a targeted subset of proteins to produce optimum effect. In reality, either our molecule would not bind even one favored protein sufficiently and lack efficacy, or it would bind the wrong proteins and show toxicity. In fact the reason why no drug can escape at least a few side effects is precisely because it binds to many other proteins other than the one we intended it to.

Faced with this wall of biological complexity, what do we do? Ironically, what we had done for hundreds of years, only this time armed with far more data and smarter data analysis tools. Simply put, you don't worry about understanding how exactly your molecule interacts with a particular protein; you worry instead only about its visible effects, about how much it impacts your blood pressure or glucose levels, or how much it increases urine output or metabolic activity. These endpoints are agnostic of knowledge of the detailed mechanism of action of a drug. You can also compare these results across a panel of drugs to try to decipher similarities and differences.

This is top-down drug design and discovery, writ large in the era of Big Data and techniques from computer science like machine learning and deep learning. The field is fundamentally steeped in data analysis and takes advantage of new technology that can measure umpteen effects of drugs on biological systems, greatly improved computing power and hardware to analyze these effects, and refined statistical techniques that can separate signal from noise and find trends.

The top-down approach is today characterized mainly by phenotypic screening and machine learning. Phenotypic screening involves simply throwing a drug at a cell, organ or animal and observing its effects. In its primitive form it was used to discover many of today's important drugs; in the field of anxiety medicine for instance, new drugs were discovered by giving them to mice and simply observing how much fear the mice exhibited toward cats. Today's phenotypic screening can be more fine-grained, looking at drug effects on cell size, shape and elasticity. One study I saw looked at potential drugs for wound healing; the most important tool in that study was a high-resolution camera, and the top-down approach manifested itself through image analysis techniques that quantified subtle changes in wound shape, depth and appearance. In all these cases, the exact protein target the drug might be interacting with was a distant horizon and an unknown. The large scale, often visible, effects were what mattered. And finding patterns and subtle differences in these effects - in images, in gene expression data, in patient responses - is what the universal tool of machine learning is supposed to do best. No wonder that every company and lab from Boston to Berkeley is trying feverishly to recruit data and machine learning scientists and build burgeoning data science divisions. These companies have staked their fortunes on a future that is largely imaginary for now.

Currently there seems to be, if not a war, at least a simmering and uneasy peace between top-down and bottom-up approaches in drug discovery. And yet this seems to be mainly a fight where opponents set up false dichotomies and straw men rather than find complementary strengths and limitations. First and foremost, the ultimate proof of the pudding is in the eating, and machine learning's impact on the number of approved new drugs still has to be demonstrated; the field is simply too new. The constellation of techniques has also proven itself to be much better at solving certain problems (mainly image recognition and natural language processing) than others. A lot of early stage medicinal chemistry data contains messy assay results and unexpected structure-activity relationships (SAR) containing "activity cliffs" in which a small change in structure leads to a large change in activity. Machine learning struggles with these discontinuous stimulus-response landscapes. Secondly, there are still technical issues in machine learning such as working with sparse data and noise that have to be resolved. Thirdly, while the result of a top-down approach may be a simple image or change in cell type, the number of potential factors that can lead to that result can be hideously tangled and multifaceted. Finally, there is the perpetual paradigm of garbage-in-garbage-out (GIGO). Your machine learning algorithm is only as good as the data you feed it, and chemical and biological data are notoriously messy and ill-curated; chemical structures might be incorrect, assay conditions might differ in space and time, patient reporting and compliance might be sporadic and erroneous, human error riddles data collection, and there might be very little data to begin with. The machine learning mill can only turn data grist into gold if what it's provided with is grist in the first place.

In contrast to some of these problems with the top-down paradigm, bottom-up drug design has some distinct advantages. First of all, it has worked, and nothing speaks like success. Also operationally, since you are usually looking at the interactions between a single molecule and protein, the system is much simpler and cleaner, and the techniques to study it are less prone to ambiguous interpretation. Unlike machine learning which can be a black box, here you can understand exactly what's going on. The amount of data might be smaller, but it may also be more targeted, manageable and reproducible. You don't usually have to deal with the intricacies of data fitting and noise reduction or the curation of data from multiple sources. Ultimately at the end of the day, if like HIV protease your target does turn out to be the Achilles heel of a deadly disease like AIDS, your atom-by-atom design can be as powerful as Thor's hammer. There is little doubt that bottom-up approaches have worked in selected cases, where the relevance of the target has been validated, and there is little doubt that this will continue to be the case.

Now it's also true that just like with top-downers, bottom-uppers have had their burden of computational problems and failures, and both paradigms have been subjected to their fair share of hype. Starting from that "designing drugs using computers" headline in 1981, people have understood that there are fundamental problems in modeling intermolecular interactions: some of these problems are computational and in principle can be overcome with better hardware and software, but others like the poor understanding of water molecules and electrostatic interactions are fundamentally scientific in nature. The downplaying of these issues and the emphasizing of occasional anecdotal successes has led to massive hype in computer-aided drug design. But in case of machine learning it's even worse in some sense since hype from applications of the field in other human endeavors is spilling over in drug discovery too; it seems hard for some to avoid claiming that your favorite machine learning system is going to soon cure cancer if it's making inroads in trendy applications like self-driving cars and facial recognition. Unlike machine learning though, the bottom-up take has at least had 20 years of successes and failures to draw on, so there is a sort of lid on hype that is constantly waved by skeptics.

Ultimately, the biggest advantage of machine learning is that it allows us to bypass detailed understanding of complex molecular interactions and biological feedback and work from the data alone. It's like a system of psychology that studies human behavior purely based on stimuli and responses of human subjects, without understanding how the brain works at a neuronal level. The disadvantage is that the approach can remain a black box; it can lead to occasional predictive success but at the expense of understanding. And a good open question is to ask how long we can keep on predicting without understanding. Knowing how many unexpected events or "Black Swans" exist in drug discovery, how long can top-down approaches keep performing well?

The fact of the matter is that both top-down and bottom-up approaches to drug discovery have strengths and limitations and should therefore be part of an integrated approach to drug discovery. In fact they can hopefully work well together, like members of a relay team. I have heard of at least one successful major project in a leading drug firm in which top down phenotypic screening yielded a valuable hit which then, midstream, was handed over to a bottom-up team of medicinal chemists, crystallographers and computational chemists who deconvoluted the target and optimized the hit all the way to an NDA (New Drug Application). At the same time, it was clear that the latter would not have been made possible without the former. In my view, the old guard of the bottom-up school has been reluctant and cynical in accepting membership in the guild for the young Turks of the top-down school, while the Turks have been similarly guilty of dismissing their predecessors as antiquated and irrelevant. This is a dangerous game of all-or-none in the very complex and challenging landscape of drug discovery and development, where only multiple and diverse approaches are going to allow us to discover the proverbial needle in the haystack. Only together will the two schools thrive, and there are promising signs that they might in fact be stronger together. But we'll never know until we try.

(Image: BenevolentAI)

          These three very different structural elements were designed to carry the same load        

Dinotopia artist Jim Gurney says: "Computer modeling tools such as ZBrush and Maya have made it easier to visualize whatever form that a human designer imagines.|And 3D printing has made it possible to translate that design into physical form."

The generative process yields dozens or even hundreds of options, and the human can select which one to produce.

This new enterprise is variously called "deep-learning generative design," "intuitive AI design," and "algorithmic design." New plugins for Maya have already made such technology available.

The designs generated by this process look like something out of Art Nouveau.

They look biological, resembling skeletal architecture, with curving shapes. As with biological forms there are no straight lines and no right angles. There's no consideration of style. They're not made to look beautiful but rather to be efficient. Generative designs are vastly lighter and stronger than human designs.

The forms are often surprisingly complex, apparently more intricate than they need to be. They're not necessarily easy to produce without a 3D printer.


          Movidius Neural Compute Stick. Τεχνική νοημοσύνη σε μορφή stick από την Intel        

Η τεχνική νοημοσύνη είναι ένα ιδιαίτερα καυτό θέμα στον χώρο της τεχνολογίας με πάρα πολλές εταιρίες να επικεντρώνουν σε αυτό, μιας και πέρα από ιδιαίτερα καυτό, αναμένεται στο μέλλον να είναι και τρομερά επικερδές. Το τελευταίο προϊόν από την Intel έρχεται να προσφέρει τεχνική νοημοσύνη μέσω ενός απλού USB stick.

 

H Movidius είχε αποκαλύψει τον Απρίλιο του 2016 το Fathom Neural Compute Stick, με το οποίο στόχευε να προσφέρει στους ενδιαφερόμενους δυνατότητες τεχνικής νοημοσύνης για τα συστήματά με τον πλέον ιδιαίτερα προσιτό και οικονομικό τρόπο, αυτό ενός USB stick. Ο plug-and-play deep learning επιταχυντής ήταν να κυκλοφορήσει τον χειμώνα του 2016, με την Intel όμως να προχωράει σε εξαγορά της Movidius τον Σεπτέμβριο του 2016.

 

Με την εξαγορά να έχει πλέον ολοκληρωθεί, το Fathom Neural Compute Stick βρίσκει επιτέλους τον δρόμο τους προς την αγορά, αλλά με νέα εμφάνιση και νέο όνομα. Η νέα του ονομασία είναι Movidius Neural Compute Stick και πλέον έρχεται στο μπλε χρώμα το οποίο χαρακτηρίζει την Intel. Ενσωματώνει την Myriad 2 VPU (vision processing unit) η οποία είναι σε θέσει να επιτύχει επιδόσεις 100 gigaflops με μόλις 1W κατανάλωση. Μάλιστα αν κάποιος χρειάζεται περισσότερη ισχύ, δεν έχει παρά να συνδέσει περισσότερα USB sticks στις υποδοχές του συστήματός του.

 

 

Το Movidius Neural Compute Stick είναι ικανό να τρέξει νευρωνικά δίκτυα σε πραγματικό χρόνο και να καταστήσει δυνατές εφαρμογές τεχνικής νοημοσύνης, χωρίς την ανάγκη σύνδεσης στο διαδίκτυο. Η απουσία ανάγκης χρήσης του διαδικτύου, έχει ως αποτέλεσμα ιδιαίτερα χαμηλό latency και μεγαλύτερη ασφάλεια στον τομέα των προσωπικών δεδομένων.

 

Το Movidius Neural Compute Stick είναι άμεσα διαθέσιμο προς $79.

 

large.5978c2f75cd59_MovidiusNeuralComputeStick4.jpg

 


          Computer Accurately Describes Breast Cancers on Digital Tissue Slides        
A deep-learning computer network has been designed by a research team from Case Western Reserve University to determine whether invasive forms of breast cancer are present in whole biopsy slides.

          Zu Besuch in Amazons Hauptquartier in Seattle: Der unsichtbare Riese        

Amazon dominiert einen Großteil des Onlinehandels, baut eigene Produkte und versucht sich an Finanztechnologien wie Amazon Pay. Aber das ist nur die Fassade. Im Hintergrund versteckt sich ein Konzern, der eine viel größere Vision verfolgt. t3n hat dem Giganten einen Besuch abgestattet.

Ich sehe ihn nicht. Etwas verwirrt stehe ich auf dem Gehweg, mitten in Seattle, im Stadtbezirk Westlake. Google Maps zeigt mir an, dass ich mich am richtigen Platz aufhalte. Aber ich kann nur ein paar Gebäude, die herbstlich-weihnachtliche Starbucks-Werbung für Pumpkin-Spice-Latte und lauter Bäume mit rot-gelben Blättern erkennen. Eigentlich sollte er genau hier sein, einer der größten Onlinehändler der Welt, genauer gesagt der Campus von Amazon, das Hauptquartier. Aber ich sehe es nicht.

Das ist schon schräg. Ich suche den vielleicht wichtigsten E-Commerce-Händler der Welt, einen Konzern, der in den ersten neun Monaten 2016 rund 92 Milliarden US-Dollar umgesetzt hat, der 2015 mehr als 300.000 Mitarbeiter beschäftigte, der mehr als 300 Millionen Kunden weltweit zählt. Und kann auf den Gebäuden um mich herum nicht einmal ein Logo entdecken.

Gratis-Bananen statt Glaspaläste

Erst ein Bananenstand entwirrt meine Verwirrung. Den „Community Banana Stand“ kenne ich aus eigener Recherche, eine Gratis-Bananen-Aktion von Amazon. Ich kann also nicht ganz falsch sein. Das nächste Indiz entdecke ich am Gebäude 440 Terry-Avenue-North. „Day One North“ heißt es dort auf einem Schriftzug, eine Anspielung auf das Jeff-Bezos-Zitat „It’s still day one“, wie der Amazon-Kenner weiß. Ich öffne die Tür und betrete die Lobby – und da, endlich, prangt auch das erste Logo.

Hier, in Seattle, lässt sich die Strategie von Firmengründer Jeff Bezos vielleicht am besten erkennen: Amazon ist überall, aber man sieht den Konzern nicht. Google, Apple, Facebook – sie alle bauen riesige Firmenzentralen im Silicon Valley, eine Anlaufstelle für alles, einen Pilgerort für die Glaubensgemeinde, eine größer und pompöser als die andere. Nicht so Bezos. Einen wirklichen Campus gibt es nicht. In insgesamt 30 Gebäuden verteilt sich sein Unternehmen über die gesamte Stadt, im Süden stehen zwei weitere Wolkenkratzer, ein dritter ist gerade im Bau. Nicht an einem einzigen davon lässt sich von außen erkennen, dass Amazon im Innern zu finden ist. „Es ist nicht unsere Art auffällig zu sein, daher brauchen wir unsere Logos nicht auf allen unseren Gebäuden.“, heißt es am Empfangstresen.

Subtil, unaufdringlich, ja zurückhaltend gibt sich das Unternehmen an seinem Heimatstandort. Das ist keine Koketterie, sondern ein Spiegelbild der Firmenphilosophie: Völlig unauffällig, aber zielstrebig durchsetzt Amazon nicht nur Seattle, sondern auch den Alltag von Millionen Menschen auf der ganzen Welt. Der Verkauf von Waren online stellt da noch den offensichtlichsten Teilaspekt dar. Von frischen Lebensmitteln bis hin zu Werkzeug findet der Kunde dort alles. Mit Amazon Echo und dem Kindle Reader versucht sich das Unternehmen außerdem an hauseigener Hardware, mit Prime drängt es in den Streamingmarkt und mit Amazon Pay in die Finanztechnologie.

Eher wie ein Coffeeshop-Tresen als wie die Lobby des größten E-Commerce-Händlers der Welt sieht der Eingangsbereich von Day 1 North aus. Immerhin gibt es dort Leckerlis für Hunde, welche die Mitarbeiter mitbringen dürfen. (Foto: Jochen Fuchs)
Im Hintergrund versorgt Amazon aber nicht nur den Verbraucher, sondern hat mit seinem Marktplatz eine riesige B2B-Plattform aufgezogen. Händler können ihre Produkte darüber als eigenständiger Anbieter verkaufen – Firmen wie etwa das Startup KW-Commerce basieren ganze Geschäftsmodelle darauf. Und bei der Produktsuche hat Amazon längst Google abgelöst. Wie stark das Portfolio des Konzerns gewachsen ist, veranschaulicht auch das Cloud-Geschäft. Mit Amazon Web Services (AWS) erwirtschaftete der Konzern in den ersten neun Monaten 2016 knapp neun Milliarden US-Dollar. Damit steht die Sparte für etwa zehn Prozent des Gesamtumsatzes. Vor allem aber dominiert AWS den Infrastrukturmarkt: Mit 45 Prozent besitzt Amazon mehr Marktanteile als Microsoft, Google und IBM gemeinsam.

Der ewig erste Tag

Aber Jeff Bezos reicht das nicht. Der Amazon-Gründer treibt seine Mitarbeiter ständig an, weiter neue Ideen zu entwickeln. Das Zitat „It’s still day one“ veranschaulicht das besonders gut. Die Beschäftigten sollen jeden Tag so denken, als sei es noch der erste Tag des Unternehmens, als ließe sich noch alles neu entdecken, als ließe sich noch alles umwerfen. Risikofreudig Gelegenheiten ergreifen, die eine Chance auf eine zukünftige Marktführerschaft ermöglichen. Der Kunde steht im Fokus. Denn das zahlt letztlich auf die Marke Amazon ein. „Mit allem was wir machen, wollen wir Kunden einen Mehrwert bieten.“, fasst es Patrick Gauthier, Vizepräsident von Amazon Pay, zusammen. Das Unternehmen verfolge keinen geheimen Masterplan, es identifiziere Kundenbedürfnisse.

Seattle und die Gegend drumherum halten als Spielwiese für dieses Ausprobieren her. In der größten Stadt des Staates Washington wurde der Lieferdienst Prime Now, der Produkte mittlerweile auch weltweit am gleichen Tag ausliefert, erstmals ausprobiert. Hier startete der Supermarkt Amazon Go, bei dem Kunden dank Walk-Out-Technologie einfach Waren einpacken und – ohne an eine Kasse zu treten – wieder gehen können. Bezahlt wird automatisch mit der Amazon-App. Durch Kameras, Sensoren und Deep-Learning-Algorithmen weiß sie, wann der Kunde das Geschäft verlässt und der Einkauf beendet ist.

Was neues mit Büchern

Und hier in Seattle steht auch der erste Amazon Bookstore, eine Annäherung des Konzerns an die analoge Welt. Auf den ersten Blick sieht das Geschäft in der University Village wie ein ganz normaler Buchladen aus, mit dunklen Ledersesseln und hochwertigen Echtholzmöbeln. Die Besonderheiten scheinen erst auf den zweiten Blick durch: Das Sortiment ist datengetrieben und anhand der Kundenvorlieben in Seattle zusammengestellt. Die Preise sind nicht an den Büchern notiert, sondern können per Amazon-App abgefragt werden. Die Regale zeigen nicht nur Genres, sondern auch „Bestseller-Sachbücher im Nordwesten“, „Die populärsten ersten Comics für Anfänger“, „Höchstbewertet mit 4,8 Sternen und mehr“ und „Wenn Ihnen ‚Zero to One‘ gefällt, gefällt Ihnen auch das hier“. Und wer bezahlen will, kann das mit seiner Amazon-App an der Kasse machen. Alles wie im Onlineshop – nur eben in echt.

Neben den Büchern stellt Amazon die eigene Hardware in den Fokus. In der Kinderabteilung stehen Fire-Tablets der Kids-Edition, auf dem Kindle-Reader können Kunden Bücher durchblättern. Eigene Regale und Fernseher präsentieren hier alles von Amazon Echo bis zum Fire-TV-Stick. Das Konzept scheint sich bewährt zu haben. Amazon hat binnen eines Jahres auch einen Bookstore in Washington Square im Bundesstaat Oregon und in Westfield in Kalifornien aufgebaut. Geschäfte in New York, New Jersey, Illinois und Massachussetts sollen folgen.

Dass ausgerechnet der Online-Pionier nun offline Ware verkauft, mag überraschen. Für Amazon ist es aber nur ein logischer Schritt, weiter zu auf den Verbraucher. „Es geht nicht um online versus offline, es geht um die Kundenerfahrung.“, sagt Amazon-Pay-Vize Gauthier. „Aus Amazons Sicht, ist der Kunde wohl vorurteilsfrei in der Nutzung von Kanälen.“ Wichtiger sei vielmehr, dass alle Kanäle intelligent miteinander verbunden seien, wie im Amazon Bookstore.

In den drei Glaskuppeln, die derzeit am Amazon-Headquarter entstehen, sollen Mikroklimazonen für über 300 Pflanzen aus aller Welt entstehen. Aber auch Mitarbeiter sollen hier einen Arbeitsplatz im Grünen finden. (Foto: Jochen Fuchs)
Das gelingt nicht immer, trotz Finanz- und Datenkraft: Der Treasure Truck etwa, ein Laster mit umlaufender Amazon-Leuchtschrift. Mit dieser Willy-Wonka-Version des Eismann-Wagens und einer dazugehörigen App veranstaltet der Konzern Flash-Sales: Nur ein einziges Produkt pro Tag wird angeboten, ein Paddelboot zum Beispiel oder ein hochwertiges Porterhouse-Steak, zu einem niedrigen Preis. So richtig schlägt der auf Retro getrimmte Truck offenbar nicht ein: Bisher dreht er nur in Seattle seine Runden. Aber das ist ein kleines und verschmerzbares Projekt für den Onlinegiganten.

Ein Mitarbeiter namens „Robo-Stow“

Wichtiger für den Konzern ist ein Unternehmensteil im etwa eine Autostunde entfernten DuPont. Mitten im Grünen, in der Nähe des Mount-Rainier-Nationalparks, hat Amazon ein 100.000 Quadratmeter großes High-Tech-Logistikzentrum errichtet. Dort versucht sich der Onlinehändler an der Zukunft der Logistik. Der stärkste „Mitarbeiter“ der Lagerhalle nennt sich „Robo-Stow“: Ein sechs Tonnen schwerer, gelber Roboterarm, der komplette Paletten sieben Meter in die Höhe stemmt und auf einem selbstfahrenden Gefährt abstellt. Das wiederum transportiert die Palette eigenständig durch eine Landschaft von Förderländern in den Lagerbereich. Damit der Mensch nicht im Weg steht, zeigen ihm fest eingezeichnete Markierungen an, auf welchen Pfaden er sich bewegen darf – denn wenn er sie verlässt, besteht Unfallgefahr.

Dass Amazon auf automatisierte Helfer setzt, hängt auch mit Effizienz zusammen. In ein robotergetriebenes Logistikzentrum können deutlich mehr Waren eingelagert werden als in ein herkömmliches. Trotzdem kommt es nicht ganz ohne den Menschen aus: Von ursprünglich 350 Mitarbeitern im Jahr 2014 ist die Zahl auf derzeit 750 Mitarbeiter angestiegen. Denn auch wenn flache, orangefarbene Roboter unter den Regalen hin- und herflitzen und Artikel in die Regale einsortieren, bleiben die Mitarbeiter unabkömmlich: Packen, einlagern und Lastwagen beladen beherrschen sie immer noch am besten.

Derzeit kann von Kürzungen bei Amazon aber ohnehin keine Rede sein – nicht nur in den Lagern. 100.000 neue Beschäftigte will der Konzern in den kommenden 18 Monaten in den USA einstellen, das ist ein Drittel der heutigen Zahl. Wenn für jede Stelle nur drei Bewerbungsgespräche geführt würden, würden das 800 Bewerbungsgespräche pro Tag bedeuten, hat das Handelsblatt ausgerechnet. Wie viele von den Neulingen nach Seattle kommen werden, wird man sehen. Aktuell beschäftigt Amazon 27.000 Mitarbeiter an seinem Heimatstandort, monatlich kommen schätzungsweise 1.000 neue hinzu. Wenn der dritte Tower im Süden fertiggestellt ist, hat Amazon Platz für rund 55.000 Mitarbeiter.

Vor einiger Zeit gab es Gerüchte, dass das Hauptquartier aus der Innenstadt in die günstigeren Vororte umziehen könnte. Spätestens mit dem Bau des dritten Towers im Süden sind diese Gerüchte verstummt. Jeff Bezos hat sich ganz bewusst für die Innenstadt entschieden. Die Abwechslung, die die umgebenden Geschäfte, Restaurants und Food-Trucks bieten, soll einen positiven Effekt auf die Amazonians haben. Und die umgekehrt auf die lokale Wirtschaft. Denn der Konzern verzichtet auf eigene Kantinen oder Aufenthaltsräume. Es gibt nur einige wenige Cafeterien auf dem Campus. Die Mitarbeiter sollen in der Umgebung essen gehen.

Auch so eine Eigenheit des Onlinehändlers: Während andere Konzerne wie Google und Facebook ihre Mitarbeiter mit freien Snacks verwöhnen, hält sich Amazon damit zurück. Außer Wasser und Kaffee bekommen die Mitarbeiter nichts gratis, in den Cafeterien zahlen sie alles selbst. „Wer bei Amazon arbeitet, tut das weil er es gerne macht“, erzählt ein Amazon-Mitarbeiter. „Nicht wegen der Gratis-Goodies.“ Wer eine Cola will, der kauft sich eine. Ende.

Das heißt nicht, dass Amazon keine Unternehmenskultur besitzt. Aber eben eine andere. Haustiere zum Beispiel genießen allzeitiges Aufenthaltsrecht in den Büros. Mittlerweile beherbergen die Arbeitsräume 2.000 Kleintiere, die meisten davon Hunde. Leckerlis für sie stehen auf jedem Empfangstresen in der Lobby. Die Begeisterung geht so weit, dass das neue Amazon-Prestigegebäude, die Biosphäre „The Spheres“, aktuell unter dem Decknamen „Rufus II“ errichtet wird – eine Hommage an den ersten Bürohund.

Broomball, das Karriere-Spiel

Als ich durch eine der Cafeterien laufe, entdecke ich eine Bildergalerie mit stolzen Menschen, die seltsame Besen präsentieren und sportliche Kleidung tragen. Ich bleibe stehen. „Quidditch?“, frage ich. Mein Guide lacht. „Nein, Broomball.“ Die Sportart wurde in Amazons Anfängen erfunden – noch bevor die ersten „Harry Potter“-Bücher herauskamen. Die Innovationslust des Onlinehändlers macht eben auch beim Sport nicht halt.

Mit zusammengeklebten Besen treiben Mannschaften einen großen aufblasbaren Ball gemeinsam mit Jeff Bezos über ein Feld, traditionell am „Picnic Day“, dem Firmenausflugstag. Was spaßig klingt, wird mit größtem Ehrgeiz betrieben: Obskure Internet-Legenden munkeln, dass Bezos das Happening so ernst nehme, dass Erfolg und Misserfolg in dem Spiel die Karriere beeinflussen.

Das erste Amazon-Büro – noch heute gibt sich das Unternehmen an seinem Firmensitz bewusst unauffällig und ohne Koketterie. (Foto: Amazon)
Dass ich diese Einblicke erhalte, täuscht nicht darüber hinweg, dass mir nur ein Spalt in die Welt von Amazon gewährt wird. Ein Rest Zurückhaltung bleibt. Wenn ich frage, was genau in diesem oder jenem Gebäude steckt, welcher Unternehmensbereich, welche Projekte, dann weiß mein Amazon-Guide oft keine Antwort. Vielleicht macht es die stetige Veränderung schwierig, den Überblick zu behalten: Unternehmensbereiche werden gegründet oder aufgelöst, neue Gebäude innerhalb weniger Monate aus dem Boden gestampft. Vielleicht lässt es die ständige Veränderung nicht zu, Amazon in- und auswendig zu kennen. Vielleicht will man es auch einfach nicht offenbaren.

Vielleicht hätte Jeff Bezos meine Fragen beantworten können, leider habe ich ihn nicht in Seattle getroffen. Als ich meinen Guide im Scherz nach dem Gründer frage, lacht er, erzählt mir aber, dass Bezos immer noch stark ins Tagesgeschäft eingebunden sei und regelmäßig in Seattle in der Zentrale arbeite. Theoretisch könne man den Chef überall antreffen.

So wie Amazon selbst.


          Gpu Benchmark: GTX 1080 vs. Titan X        

GTX 1080 vs Titan X

Per i videogamers professionisti le sigle GTX 1080 e Titan X rappresentano il top di gamma tra le schede video (GPU) attualmente presenti sul mercato e sono, quindi, la meta da raggiungere per ogni videogiocatore che si rispetti. Entrambe le schede video sono prodotte da Nvidia e sono tutto quello che si vorrebbe da una scheda video: alte performance e configurazione hardware elevata.

Caratteristiche Tecniche

Come dicevamo la configurazione hardware di queste schede video è di altissimo livello e, come vedremo più avanti, nonostante le differenze tra le due GPU non siano elevate, solo una di esse è risultata nettamente la più performante dopo tutti i test a cui è stata sottoposta.
Gtx 1080 Titan X
Core CUDA  2560 3584
Clock di Base  1.6 Ghz 1.42 Ghz
Clock Boost  1.73 Ghz 1.53 Ghz
Memoria  8 Gb DDR5X 12 GB DDR5X
Banda di Memoria  320 GB/s 480 GB/s
TDP  180 W 250 W
Processore  GP104 GP102
Transistor  7.2bn 12bn

Prezzo

I prezzi di queste schede video rispecchiano le performance che sono in grado di raggiungere e i prezzi attuali sono molto elevati. Si parla di circa 1200 Euro per la Titan X mentre la Gtx 1080 costa circa 800 Euro.

Offerte Amazon

Abbiamo selezionato per voi due offerte interessanti da Amazon per questi due gioiellini di tecnologia. Se siete dei fan dei videogame e vi piace giocare solo alle massime risoluzioni e con il massimo della fluidità, valutare l'acquisto di una di queste schede potrebbe rappresentare il regalo perfetto da mettere sotto l'albero questo Natale:
 

Benchmark

Arriviamo ora al succo di quest'articolo ossia quali sono i risultati che queste due schede hanno fatto registrare durante i vari Benchmark per GPU. Già da questa estate era iniziato il fermento legato ai risultati dei test in quanto fino ad allora gli unici dati conosciuti per la GTX 1080 e la Titan X erano solo i dati ottenuti in laboratorio e diffusi dalla stessa Nvidia. I numeri erano interessanti ma chi segue queste notizie sa che solitamente i risultati registrati si discostano da quelli dichiarati dal produttore pertanto non appena è stato possibile mettere le mani su queste due schede video Nvidia, diversi portali online di hardware le hanno messe a confronto con il classico test 3d Mark. I risultati del test 3D Mark effettuati sulle due schede video sono consultabili a questo url e di seguito vi riportiamo la classifica. 3D Mark: Titan X vs GTX 1080 Questo tipo di confronto va più che bene per determinare quale scheda video sarà più performante in ambito videoludico ma non tutti sanno che le schede video più performanti come la Titan X e la GTX 1080 hanno applicazione anche in alcune nuove discipline come il Machine Learning o, nello specifico, il Deep Learning. Caso ha voluto che provando a cercare sul web  qualche notizia a riguardo abbiamo scoperto che una società italiana che si occupa di queste discipline, la Add-For di Torino, ha recentemente pubblicato il proprio studio sulle performance di queste due schede video applicate ad algoritmi di Deep Learning e testate con diverse librerie quali TensorFlow, Caffè e Neon. I risultati ottenuti durante questi Deep Learning Benchmarks hanno dato risultati leggermente diversi, eleggendo come regina indiscussa delle schede video la Titan X. Sicuramente si tratta di due schede video con altissime capacità e in grado di fare la gioia di qualsiasi videogiocatore. Se oltre all'aspetto ludico vi interessano anche applicazioni più professionali come nel caso del Deep Learning, allora la Titan X è la scelta ottimale! Prossimamente la Add-for rilascerà nuovi benchmark di schede video quali la Nvidia Tesla K40 e K80 che saranno installate sui performanti sistemi HPC.
          #9: Deep Learning (Adaptive Computation and Machine Learning)        
Deep Learning
Deep Learning (Adaptive Computation and Machine Learning)
Ian Goodfellow , Yoshua Bengio , Aaron Courville
(5)

Neu kaufen: EUR 67,99
41 Angebote ab EUR 62,50

(In der Sachbücher-Bestseller-Liste finden Sie maßgebliche Informationen über die aktuelle Rangposition dieses Produkts.)
          Nuts and Bolts of Building Deep Learning Applications: Ng @ NIPS2016        
You might go to a cutting-edge machine learning research conference like NIPS hoping to find some mathematical insight that will help you take your deep learning system's performance to the next level. Unfortunately, as Andrew Ng reiterated to a live crowd of 1,000+ attendees this past Monday, there is no secret AI equation that will let you escape your machine learning woes. All you need is some rigor, and much of what Ng covered is his remarkable NIPS 2016 presentation titled "The Nuts and Bolts of Building Applications using Deep Learning" is not rocket science. Today we'll dissect the lecture and Ng's key takeaways. Let's begin.

Figure 1. Andrew Ng delivers a powerful message at NIPS 2016.


Andrew Ng and the Lecture
Andrew Ng's lecture at NIPS 2016 in Barcelona was phenomenal -- truly one of the best presentations I have seen in a long time. In a juxtaposition of two influential presentation styles, the CEO-style and the Professor-style, Andrew Ng mesmerized the audience for two hours. Andrew Ng's wisdom from managing large scale AI projects at Baidu, Google, and Stanford really shows. In his talk, Ng spoke to the audience and discussed one of they key challenges facing most of the NIPS audience -- how do you make your deep learning systems better? Rather than showing off new research findings from his cutting-edge projects, Andrew Ng presented a simple recipe for analyzing and debugging today's large scale systems. With no need for equations, a handful of diagrams, and several checklists, Andrew Ng delivered a two-whiteboards-in-front-of-a-video-camera lecture, something you would expect at a group research meeting. However, Ng made sure to not delve into Research-y areas, likely to make your brain fire on all cylinders, but making you and your company very little dollars in the foreseeable future.

Money-making deep learning vs Idea-generating deep learning
Andrew Ng highlighted the fact that while NIPS is a research conference, many of the newly generated ideas are simply ideas, not yet battle-tested vehicles for converting mathematical acumen into dollars. The bread and butter of money-making deep learning is supervised learning with recurrent neural networks such as LSTMs in second place. Research areas such as Generative Adversarial Networks (GANs), Deep Reinforcement Learning (Deep RL), and just about anything branding itself as unsupervised learning, are simply Research, with a capital R. These ideas are likely to influence the next 10 years of Deep Learning research, so it is wise to focus on publishing and tinkering if you really love such open-ended Research endeavours. Applied deep learning research is much more about taming your problem (understanding the inputs and outputs), casting the problem as a supervised learning problem, and hammering it with ample data and ample experiments.

"It takes surprisingly long time to grok bias and variance deeply, but people that understand bias and variance deeply are often able to drive very rapid progress." 
--Andrew Ng 


The 5-step method of building better systems
Most issues in applied deep learning come from a training-data / testing-data mismatch. In some scenarios this issue just doesn't come up, but you'd be surprised how often applied machine learning projects use training data (which is easy to collect and annotate) that is different from the target application. Andrew Ng's discussion is centered around the basic idea of bias-variance tradeoff. You want a classifier with a good ability to fit the data (low bias is good) that also generalizes to unseen examples (low variance is good). Too often, applied machine learning projects running as scale forget this critical dichotomy. Here are the four numbers you should always report:
  • Training set error
  • Testing set error
  • Dev (aka Validation) set error
  • Train-Dev (aka Train-Val) set error

Andrew Ng suggests following the following recipe:


Figure 2. Andrew Ng's "Applied Bias-Variance for Deep Learning Flowchart"
for building better deep learning systems.


Take all of your data, split it into 60% for training and 40% for testing. Use half of the test set for evaluation purposes only, and the other half for development (aka validation). Now take the training set, leave out a little chunk, and call it the training-dev data. This 4-way split isn't always necessary, but consider the worse case where you start with two separate sets of data, and not just one: a large set of training data and a smaller set of test data. You'll still want to split the testing into validation and testing, but also consider leaving out a small chunk of the training data for the training-validation. By reporting the data on the training set vs the training-validation set, you measure the "variance."

Figure 3. Human-level vs Training vs Training-dev vs Dev vs Test. 
Taken from Andrew Ng's 2016 talk.


In addition to these four accuracies, you might want to report the human-level accuracy, for a total of 5 quantities to report. The difference between human-level and training set performance is the Bias. The difference between the training set and the training-dev set is the Variance. The difference between the training-dev and dev sets is the train-test mismatch, which is much more common in real-world applications that you'd think. And finally, the difference between the dev and test sets measures how overfitting.

Nowhere in Andrew Ng's presentation does he mention how to use unsupervised learning, but he does include a brief discussion about "Synthesis." Such synthesis ideas are all about blending pre-existing data or using a rendering engine to augment your training set.

Conclusion
If you want to lose weight, gain muscle, and improve your overall physical appearance, there is no magical protein shake and no magical bicep-building exercise. The fundamentals such as reduced caloric intake, getting adequate sleep, cardiovascular exercise, and core strength exercises like squats and bench presses will get you there. In this sense, fitness is just like machine learning -- there is no secret sauce. I guess that makes Andrew Ng the Arnold Schwarzenegger of Machine Learning.

What you are most likely missing in your life is the rigor of reporting a handful of useful numbers such as performance on the 4 main data splits (see Figure 3). Analyzing these numbers will let you know if you need more data or better models, and will ultimately let you hone in your expertise on the conceptual bottleneck in your system (see Figure 2).

With a prolific research track record that never ceases to amaze, we all know Andrew Ng as one hell of an applied machine learning researcher. But the new Andrew Ng is not just another data-nerd. His personality is bigger than ever -- more confident, more entertaining, and his experience with a large number of academic and industrial projects makes him much wiser. With enlightening lectures as "The Nuts and Bolts of Building Applications with Deep Learning" Andrew Ng is likely to be an individual whose future keynotes you might not want to miss.

Appendix
You can watch a September 27th, 2016 version of the Andrew Ng Nuts and Bolts of Applying Deep Learning Lecture on YouTube, which he delivered at the Deep Learning School. If you are working on machine learning problems in a startup, then definitely give the video a watch. I will update the video link once/if the newer NIPS 2016 version shows up online.

You can also check out Kevin Zakka's blog post for ample illustrations and writeup corresponding to Andrew Ng's entire talk.






          Making Deep Networks Probabilistic via Test-time Dropout        
In Quantum Mechanics, Heisenberg's Uncertainty Principle states that there is a fundamental limit to how well one can measure a particle's position and momentum. In the context of machine learning systems, a similar principle has emerged, but relating interpretability and performance. By using a manually wired or shallow machine learning model, you'll have no problem understanding the moving pieces, but you will seldom be happy with the results. Or you can use a black-box deep neural network and enjoy the model's exceptional performance. Today we'll see one simple and effective trick to make our deep black boxes a bit more intelligible. The trick allows us to convert neural network outputs into probabilities, with no cost to performance, and minimal computational overhead.

Interpretability vs Performance: Deep Neural Networks perform well on most computer vision tasks, yet they are notoriously difficult to interpret.










The desire to understand deep neural networks has triggered a flurry of research into Neural Network Visualization, but in practice we are often forced to treat deep learning systems as black-boxes. (See my recent Deep Learning Trends @ ICLR 2016 post for an overview of recent neural network visualization techniques.) But just because we can't grok the inner-workings of our favorite deep models, it doesn't mean we can't ask more out of our deep learning systems.

There exists a simple trick for upgrading black-box neural network outputs into probability distributions. 

The probabilistic approach provides confidences, or "uncertainty" measures, alongside predictions and can make almost any deep learning systems into a smarter one. For robotic applications or any kind of software that must make decisions based on the output of a deep learning system, being able to provide meaningful uncertainties is a true game-changer.


Applying Dropout to your Deep Neural Network is like occasionally zapping your brain
The key ingredient is dropout, an anti-overfitting deep learning trick handed down from Hinton himself (Krizhevsky's pioneering 2012 paper). Dropout sets some of the weights to zero during training, reducing feature co-adaptation, thus improving generalization.
Without dropout, it is too easy to make a moderately deep network attain 100% accuracy on the training set. 
The accepted knowledge is that an un-regularized network (one without dropout) is too good at memorizing the training set. For a great introductory machine learning video lecture on dropout, I highly recommend you watch Hugo Larochelle's lecture on Dropout for Deep learning.


Geoff Hinton's dropout lecture, also a great introduction, focuses on interpreting dropout as an ensemble method. If you're looking for new research ideas in the dropout space, a thorough understanding of Hinton's interpretation is a must.


But while dropout is typically used at training-time, today we'll highlight the keen observation that dropout used at test-time is one of the simplest ways to turn raw neural network outputs into probability distributions. Not only does this probabilistic "free upgrade" often improve classification results, it provides a meaningful notion of uncertainty, something typically missing in Deep Learning systems.
The idea is quite simple: to estimate the predictive mean and predictive uncertainty, simply collect the results of stochastic forward passes through the model using dropout. 

How to use dropout: 2016 edition

  1. Start with a moderately sized network
  2. Increase your network size with dropout turned off until you perfectly fit your data
  3. Then, train with dropout turned on
  4. At test-time, turn on dropout and run the network T times to get T samples
  5. The mean of the samples is your output and the variance is your measure of uncertainty

Remember that drawing more samples will increase computation time during testing unless you're clever about re-using partial computations in the network. Please note that if you're only using dropout near the end of your network, you can reuse most of the computations. If you're not happy with the uncertainty estimates, consider adding more layers of dropout at test-time. Since you'll already have a pre-trained network, experimenting with test-time dropout layers is easy.

Bayesian Convolutional Neural Networks

To be truly Bayesian about a deep network's parameters, we wouldn't learn a single set of parameters w, we would infer a distribution over weights given the data, p(w|X,Y). Training is already quite expensive, requiring large datasets and expensive GPUs.
Bayesian learning algorithms can in theory provide much better parameter estimates for ConvNets and I'm sure some of our friends at Google are working on this already. 
But today we aren't going to talk about such full Bayesian Deep Learning systems, only systems that "upgrade" the model prediction y to p(y|x,w). In other words, only the network outputs gain a probabilistic interpretation.

An excellent deep learning computer vision system which uses test-time dropout comes from a recent University of Cambridge technique called SegNet. The SegNet approach introduced an Encoder-Decoder framework for dense semantic segmentation. More recently, SegNet includes a Bayesian extension that uses dropout at test-time for providing uncertainty estimates. Because the system provides a dense per-pixel labeling, the confidences can be visualized as per-pixel heatmaps. Segmentation system is not performing well? Just look at the confidence heatmaps!

Bayesian SegNet. A fully convolutional neural network architecture which provides 
per-pixel class uncertainty estimates using dropout.


The Bayesian SegNet authors tested different strategies for dropout placement and determined that a handful of dropout layers near the encoder-decoder bottleneck is better than simply using dropout near the output layer. Interestingly, Bayesian SegNet improves the accuracy over vanilla SegNet. Their confidence maps shown high uncertainty near object boundaries, but different test-time dropout schemes could provide a more diverse set of uncertainty estimates.

Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla, in arXiv:1511.02680, November 2015. [project page with videos]


Confidences are quite useful for evaluation purposes, because instead of providing a single average result across all pixels in all images, we can sort the pixels and/or images by the overall confidence in prediction. When evaluation the top 10% most confident pixels, we should expect significantly higher performance. For example, the Bayesian SegNet approach achieves 75.4% global accuracy on the SUN RGBD dataset, and an astonishing 97.6% on most confident 10% of the test-set [personal communication with Bayesian SegNet authors]. This kind of sort-by-confidence evaluation was popularized by the PASCAL VOC Object Detection Challenge, where precision/recall curves were the norm. Unfortunately, as the research community moved towards large-scale classification, the notion of confidence was pushed aside. Until now.

Theoretical Bayesian Deep Learning

Deep networks that model uncertainty are truly meaningful machine learning systems. It ends up that we don't really have to understand how a deep network's neurons process image features to trust the system to make decisions. As long as the model provides uncertainty estimates, we'll know when the model is struggling. This is particularly important when your network is given inputs that are far from the training data.

The Gaussian Process: A machine learning approach with built-in uncertainty modeling

In a recent ICML 2016 paper, Yarin Gal and Zoubin Ghahramani develop a new theoretical framework casting dropout training in deep neural networks as approximate Bayesian inference in deep Gaussian processes. Gal's paper gives a complete theoretical treatment of the link between Gaussian processes and dropout, and develops the tools necessary to represent uncertainty in deep learning. They show that a neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to the probabilistic deep Gaussian process. I have yet to see researchers use dropout between every layer, so the discrepancy between theory and practice suggests that more research is necessary.

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning Yarin Gal, Zoubin Ghahramani, in ICML. June 2016. [Appendix with relationship to Gaussian Processes]
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks Yarin Gal, in arXiv:1512.05287. May 2016.
What My Deep Model Doesn't Know. Yarin Gal. Blog Post. July 2015 
Homoscedastic and Heteroscedastic Regression with Dropout Uncertainty. Yarin Gal. Blog Post. February 2016.

Test-time dropout is used to provide uncertainty estimates for deep learning systems.

In conclusion, maybe we can never get both interpretability and performance when it comes to deep learning systems. But, we can all agree that providing confidences, or uncertainty estimates, alongside predictions is always a good idea. Dropout, the very single regularization trick used to battle overfitting in deep models, shows up, yet again. Sometimes all you need is to add some random variations to your input, and average the results over many trials. Dropout lets you not only wiggle the network inputs but the entire architecture.

I do wonder what Yann LeCun thinks about Bayesian ConvNets... Last I heard, he was allergic to sampling.

Related Posts 
Deep Learning vs Probabilistic Graphical Models vs Logic April 2015
Deep Learning Trends @ ICLR 2016 June 2016


          Deep Learning Trends @ ICLR 2016        
Started by the youngest members of the Deep Learning Mafia [1], namely Yann LeCun and Yoshua Bengio, the ICLR conference is quickly becoming a strong contender for the single most important venue in the Deep Learning space. More intimate than NIPS and less benchmark-driven than CVPR, the world of ICLR is arXiv-based and moves fast.



Today's post is all about ICLR 2016. I’ll highlight new strategies for building deeper and more powerful neural networks, ideas for compressing big networks into smaller ones, as well as techniques for building “deep learning calculators.” A host of new artificial intelligence problems is being hit hard with the newest wave of deep learning techniques, and from a computer vision point of view, there's no doubt that deep convolutional neural networks are today's "master algorithm" for dealing with perceptual data.

Deep Powwow in Paradise? ICLR 2016 was held in Puerto Rico. 

Whether you're working in Robotics, Augmented Reality, or dealing with a computer vision-related problem, the following summary of ICLR research trends will give you a taste of what's possible on top of today's Deep Learning stack. Consider today's blog post a reading group conversation-starter.

Part I: ICLR vs CVPR
Part II: ICLR 2016 Deep Learning Trends
Part III: Quo Vadis Deep Learning?


Part I: ICLR vs CVPR

Last month's International Conference of Learning Representations, known briefly as ICLR 2016, and commonly pronounced as “eye-clear,” could more appropriately be called the International Conference on Deep Learning. The ICLR 2016 conference was held May 2nd-4th 2016 in lovely Puerto Rico. This year was the 4th installment of the conference -- the first was in 2013 and it was initially so small that it had to be co-located with another conference. Because it was started by none other than the Deep Learning Mafia, it should be no surprise that just about everybody at the conference was studying and/or applying Deep Learning Methods. Convolutional Neural Networks (which dominate image recognition tasks) were all over the place, with LSTMs and other Recurrent Neural Networks (used to model sequences and build "deep learning calculators") in second place. Most of my own research conference experiences come from CVPR (Computer Vision and Pattern Recognition), and I've been a regular CVPR attendee since 2004. Compared to ICLR, CVPR has a somewhat colder, more-emprical feel. To describe the difference between ICLR and CVPR, Yan LeCun, quoting Raquel Urtasun (who got the original saying from Sanja Fidler), put it best on Facebook.

CVPR: What can Deep Nets do for me?
ICLR: What can I do for Deep Nets?

The ICLR 2016 conference was my first official powwow that truly felt like a close-knit "let's share knowledge" event. 3 days of the main conference, plenty of evening networking events, and no workshops. With a total attendance of about 500, ICLR is about 1/4 the size of CVPR. In fact, CVPR 2004 in D.C. was my first conference ever, and CVPRs are infamous for their packed poster sessions, multiple sessions, and enough workshops/tutorials to make CVPRs last an entire week. At the end of CVPR, you'll have a research hangover and will need a few days to recuperate. I prefer the size and length of ICLR.

CVPR and NIPS, like many other top-tier conferences heavily utilizing machine learning techniques, have grown to gargantuan sizes, and paper acceptance rates at these mega conferences are close to 20%. It not necessarily true that the research papers at ICLR were any more half-baked than some CVPR papers, but the amount of experimental validation for an ICLR paper makes it a different kind of beast than CVPR. CVPR’s main focus is to produce papers that are ‘state-of-the-art’ and this essentially means you have to run your algorithm on a benchmark and beat last season’s leading technique. ICLR’s main focus it to highlight new and promising techniques in the analysis and design of deep convolutional neural networks, initialization schemes for such models, and the training algorithms to learn such models from raw data.

Deep Learning is Learning Representations
Yann LeCun and Yoshua Bengio started this conference in 2013 because there was a need to a new, small, high-quality venue with an explicit focus on deep methods. Why is the conference called “Learning Representations?” Because the typical deep neural networks that are trained in an end-to-end fashion actually learn such intermediate representations. Traditional shallow methods are based on manually-engineered features on top of a trainable classifier, but deep methods learn a network of layers which learns those highly-desired features as well as the classifier. So what do you get when you blur the line between features and classifiers? You get representation learning. And this is what Deep Learning is all about.

ICLR Publishing Model: arXiv or bust
At ICLR, papers get posted on arXiv directly. And if you had any doubts that arXiv is just about the single awesomest thing to hit the research publication model since the Gutenberg press, let the success of ICLR be one more data point towards enlightenment. ICLR has essentially bypassed the old-fashioned publishing model where some third party like Elsevier says “you can publish with us and we’ll put our logo on your papers and then charge regular people $30 for each paper they want to read.” Sorry Elsevier, research doesn’t work that way. Most research papers aren’t good enough to be worth $30 for a copy. It is the entire body of academic research that provides true value, for which a single paper just a mere door. You see, Elsevier, if you actually gave the world an exceptional research paper search engine, together with the ability to have 10-20 papers printed on decent quality paper for a $30/month subscription, then you would make a killing on researchers and I would endorse such a subscription. So ICLR, rightfully so, just said fuck it, we’ll use arXiv as the method for disseminating our ideas. All future research conferences should use arXiv to disseminate papers. Anybody can download the papers, see when newer versions with corrections are posted, and they can print their own physical copies. But be warned: Deep Learning moves so fast, that you’ve gotta be hitting refresh or arXiv on a weekly basis or you’ll be schooled by some grad students in Canada.

Attendees of ICLR
Google DeepMind and Facebook’s FAIR constituted a large portion of the attendees. A lot of startups, researchers from the Googleplex, Twitter, NVIDIA, and startups such as Clarifai and Magic Leap. Overall a very young and vibrant crowd, and a very solid representation by super-smart 28-35 year olds.

Part II: Deep Learning Themes @ ICLR 2016

Incorporating Structure into Deep Learning
Raquel Urtasun from the University of Toronto gave a talk about Incorporating Structure in Deep Learning. See Raquel's Keynote video here. Many ideas from structure learning and graphical models were presented in her keynote. Raquel’s computer vision focus makes her work stand out, and she additionally showed some recent research snapshots from her upcoming CVPR 2016 work.

Raquel gave a wonderful 3D Indoor Understanding Tutorial at last year's CVPR 2015.


One of Raquel's strengths is her strong command of geometry, and her work covers both learning-based methods as well as multiple-view geometry. I strongly recommend keeping a close look at her upcoming research ideas. Below are two bleeding edge papers from Raquel's group -- the first one focuses on soccer field localization from a broadcast of such a game using branch and bound inference in a MRF.

Raquel's new work. Soccer Field Localization from Single Image. Homayounfar et al, 2016.

Soccer Field Localization from a Single Image. Namdar Homayounfar, Sanja Fidler, Raquel Urtasun. in arXiv:1604.02715.

The second upcoming paper from Raquel's group is on using Deep Learning for Dense Optical Flow, in the spirit of FlowNet, which I discussed in my ICCV 2015 hottest papers blog post. The technique is built on the observation that the scene is typically composed of a static background, as well as a relatively small number of traffic participants which move rigidly in 3D. The dense optical flow technique is applied to autonomous driving.


Deep Semantic Matching for Optical Flow. Min Bai, Wenjie Luo, Kaustav Kundu, Raquel Urtasun. In arXiv:1604.01827.

Reinforcement Learning
Sergey Levine gave an excellent Keynote on deep reinforcement learning and its application to Robotics[3]. See Sergey's Keynote video here. This kind of work is still the future, and there was very little robotics-related research in the main conference. It might not be surprising, because having an assembly of robotic arms is not cheap, and such gear is simply not present in most grad student research labs. Most ICLR work is pure software and some math theory, so a single GPU is all that is needed to start with a typical Deep Learning pipeline.

An army of robot arms jointly learning to grasp somewhere inside Google.

Take a look at the following interesting work which shows what Alex Krizhevsky, the author of the legendary 2012 AlexNet paper which rocked the world of object recognition, is currently doing. And it has to do with Deep Learning for Robotics, currently at Google.

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen. In arXiv:1603.02199.

For those of you who want to learn more about Reinforcement Learning, perhaps it is time to check out Andrej Karpathy's Deep Reinforcement Learning: Pong From Pixels tutorial. One thing is for sure: when it comes to deep reinforcement learning, OpenAI is all-in.

Compressing Networks
Model Compression: The WinZip of
Neural Nets?
While NVIDIA might be today’s king of Deep Learning Hardware, I can’t help the feeling that there is a new player lurking in the shadows. You see, GPU-based mining of bitcoin didn’t last very long once people realized the economic value of owning bitcoins. Bitcoin very quickly transitioned into specialized FPGA hardware for running the underlying bitcoin computations, and the FPGAs of Deep Learning are right around the corner. Will NVIDIA remain the King? I see a fork in NVIDIA's future. You can continue producing hardware which pleases both gamers and machine learning researchers, or you can specialize. There is a plethora of interesting companies like Nervana Systems, Movidius, and most importantly Google, that don’t want to rely on power-hungry heatboxes known as GPUs, especially when it comes to scaling already trained deep learning models. Just take a look at Fathom by Movidius or the Google TPU.


But the world has already seen the economic value of Deep Nets, and the “software” side of deep nets isn't waiting for the FPGAs of neural nets. The software version of compressing neural networks is a very trendy topic. You basically want to take a beefy neural network and compress it down into smaller, more efficient model. Binarizing the weights is one such strategy. Student-Teacher networks where a smaller network is trained to mimic the larger network are already here. And don’t be surprised if within the next year we’ll see 1MB sized networks performing at the level of Oxford’s VGGNet on the ImageNet 1000-way classification task.

Summary from ICLR 2016's Deep Compression paper by Han et al.

This year's ICLR brought a slew of Compression papers, the three which stood out are listed below.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. Song Han, Huizi Mao, and Bill Dally. In ICLR 2016. This paper won the Best Paper Award. See Han give the Deep Compression talk.

Neural Networks with Few Multiplications. Zhouhan Lin, Matthieu Courbariaux, Roland Memisevic, Yoshua Bengio. In ICLR 2016.

8-Bit Approximations for Parallelism in Deep Learning. Tim Dettmers. In ICLR 2016.

Unsupervised Learning
Philip Isola presented a very Efrosian paper on using Siamese Networks defined on patches to learn a patch similarity function in an unsupervised way. This patch-patch similarity function was used to create a local similarity graph defined over an image which can be used to discover the extent of objects. This reminds me of the Object Discovery line of research started by Alyosha Efros and the MIT group, where the basic idea is to abstain from using class labels in learning a similarity function.

Isola et al: A Siamese network has shared weights and can be used for learning embeddings or "similarity functions."


Learning visual groups from co-occurrences in space and time Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson. In ICLR 2016.

Isola et al: Visual groupings applied to image patches, frames of a video, and a large scene dataset.


Initializing Networks: And why BatchNorm matters 
Getting a neural network up and running is more difficult than it seems. Several papers in ICLR 2016 suggested new ways of initializing networks. But practically speaking, deep net initialization is “essentially solved.” Initialization seems to be an area of research that truly became more of a “science” than an “art” once researchers introduced BatchNorm into their neural networks. BatchNorm is the butter of Deep Learning -- add it to everything and everything will taste better. But this wasn’t always the case!

In the early days, researchers had lots of problems with constructing an initial set of weights of a deep neural network such that the back propagation could learn anything. In fact, one of the reasons why the Neural Networks of the 90s died as a research program, is precisely because it was well-known that a handful of top researchers knew how to tune their networks so that they could start automatically learning from data, but the other research didn’t know all of the right initialization tricks. It was as if the “black magic” inside the 90s NNs was just too intense. At some point, convex methods and kernel SVMs because the tools of choice — with no need to initialize in a convex optimization setting, for almost a decade (1995 to 2005) researchers just ran away from deep methods. Once 2006 hit, Deep Architectures were working again with Hinton’s magical deep Boltzmann Machines and unsupervised pretraining. Unsupervised pretaining didn’t last long, as researchers got GPUs and found that once your data set is large enough (think ~2 million images in ImageNet), that simple discriminative back-propagation does work. Random weight initialization strategies and cleverly tuned learning rates were quickly shared amongst researchers once 100s of them jumped on the ImageNet dataset. People started sharing code, and wonderful things happened!

But designing new neural networks for new problems was still problematic -- one wouldn't know exactly the best way to set multiple learning rates and random initialization magnitudes. But researchers got to work, and a handful of solid hackers from Google found out that the key problem was that poorly initialized networks were having a hard time flowing information through the networks. It’s as if layer N was producing activations in one range and the subsequent layers were expecting information to be of another order of magnitude. So Szegedy and Ioffe from Google proposed a simple “trick” to whiten the flow of data as it passes through the network. Their trick, called “BatchNorm” involves using a normalization layer after each convolutional and/or fully-connected layer in a deep network. This normalization layer whitens the data by subtracting a mean and dividing by a standard deviation, thus producing roughly gaussian numbers as information flows through the network. So simple, yet so sweet. The idea of whitening data is so prevalent in all of machine learning, that it’s silly that it took deep learning researchers so long to re-discover the trick in the context of deep nets.

Data-dependent Initializations of Convolutional Neural Networks Philipp Krähenbühl, Carl Doersch, Jeff Donahue, Trevor Darrell. In ICLR 2016. Carl Doersch, a fellow CMU PhD, is going to DeepMind, so there goes another point for DeepMind.


Backprop Tricks
Injecting noise into the gradient seems to work. And this reminds me of the common grad student dilemma where you fix a bug in your gradient calculation, and your learning algorithm does worse. You see, when you were computing the derivative on the white board, you probably made a silly mistake like messing up a coefficient that balances two terms or forgetting an additive / multiplicative term somewhere.  However, with a high probability, your “buggy gradient” was actually correlated with the true “gradient”. And in many scenarios, a quantity correlated with the true gradient is better than the true gradient.  It is a certain form of regularization that hasn’t been adequately addressed in the research community. What kinds of “buggy gradients” are actually good for learning? And is there a space of “buggy gradients” that are cheaper to compute than “true gradients”? These “FastGrad” methods could speed up training deep networks, at least for the first several epochs. Maybe by ICLR 2017 somebody will decide to pursue this research track.


Adding Gradient Noise Improves Learning for Very Deep Networks. Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens. In ICLR 2016.

Robust Convolutional Neural Networks under Adversarial Noise Jonghoon Jin, Aysegul Dundar, Eugenio Culurciello. In ICLR 2016.

Attention: Focusing Computations
Attention-based methods are all about treating different "interesting" areas with more care than the "boring" areas. Not all pixels are equal, and people are able to quickly focus on the interesting bits of a static picture. ICLR 2016's most interesting "attention" paper was the Dynamic Capacity Networks paper from Aaron Courville's group at the University of Montreal. Hugo Larochelle, another key researcher with strong ties to the Deep Learning mafia, is now a Research Scientist at Twitter.
Dynamic Capacity Networks Amjad Almahairi, Nicolas Ballas, Tim Cooijmans, Yin Zheng, Hugo Larochelle, Aaron Courville. In ICLR 2016.


The “ResNet trick”: Going Mega Deep because it's Mega Fun
We saw some new papers on the new “ResNet” trick which emerged within the last few months in the Deep Learning Community. The ResNet trick is the “Residual Net” trick that gives us a rule for creating a deep stack of layers. Because each residual layer essentially learns to either pass the raw data through or mix in some combination of a non-linear transformation, the flow of information is much smoother. This “control of flow” that comes with residual blocks, lets you build VGG-style networks that are quite deep.

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke. In ICLR 2016.



Resnet in Resnet: Generalizing Residual Architectures Sasha Targ, Diogo Almeida, Kevin Lyman. In ICLR 2016.

Deep Metric Learning and Learning Subcategories
A great paper, presented by Manohar Paluri of Facebook, focused on a new way to think about deep metric learning. The paper is “Metric Learning with Adaptive Density Discrimination” and reminds me of my own research from CMU. Their key idea can be distilled to the “anti-category” argument. Basically, you build into your algorithm the intuition that not all elements of a category C1 should collapse into a single unique representation. Due to the visual variety within a category, you only make the assumption that an element X of category C is going to be similar to a subset of other Cs, and not all of them. In their paper, they make the assumption that all members of category C belong to a set of latent subcategories, and EM-like learning alternates between finding subcategory assignments and updating the distance metric. During my PhD, we took this idea even further and build Exemplar-SVMs which were the smallest possible subcategories with a single positive “exemplar” member.

Manohar started his research as a member of the FAIR team, which focuses more on R&D work, but metric learning ideas are very product-focused, and the paper is a great example of a technology that seems to be "product-ready." I envision dozens of Facebook products that can benefit from such data-derived adaptive deep distance metrics.

Metric Learning with Adaptive Density Discrimination. Oren Rippel, Manohar Paluri, Piotr Dollar, Lubomir Bourdev. In ICLR 2016.


Deep Learning Calculators
LSTMs, Deep Neural Turing Machines, and what I call “Deep Learning Calculators” were big at the conference. Some people say, “Just because you can use deep learning to build a calculator, it doesn’t mean you should." And for some people, Deep Learning is the Holy-Grail-Titan-Power-Hammer, and everything that can be described with words should be built using deep learning components. Nevertheless, it's an exciting time for Deep Turing Machines.

The winner of the Best Paper Award was the paper, Neural Programmer-Interpreters by Scott Reed and Nando de Freitas. An interesting way to blend deep learning with the theory of computation. If you’re wondering what it would look like to use Deep Learning to learn quicksort, then check out their paper. And it seems like Scott Reed is going to Google DeepMind, so you can tell where they’re placing their bets.

Neural Programmer-Interpreters. Scott Reed, Nando de Freitas. In ICLR 2016.

Another interesting paper by some OpenAI guys is “Neural Random-Access Machines” which is going to be another fan favorite for those who love Deep Learning Calculators.

Neural Random-Access Machines. Karol Kurach, Marcin Andrychowicz, Ilya Sutskever. In ICLR 2016.

Computer Vision Applications
Boundary detection is a common computer vision task, where the goal is to predict boundaries between objects. CV folks have been using image pyramids, or multi-level processing, for quite some time. Check out the following Deep Boundary paper which aggregates information across multiple spatial resolutions.

Pushing the Boundaries of Boundary Detection using Deep Learning Iasonas Kokkinos, In ICLR 2016.

A great application for RNNs is to "unfold" an image into multiple layers. In the context of object detection, the goal is to decompose an image into its parts. The following figure explains it best, but if you've been wondering where to use RNNs in your computer vision pipeline, check out their paper.

Learning to decompose for object detection and instance segmentation Eunbyung Park, Alexander C. Berg. In ICLR 2016.

Dilated convolutions are a "trick" which allows you to increase your network's receptive field size and scene segmentation is one of the best application domains for such dilations.

Multi-Scale Context Aggregation by Dilated Convolutions Fisher Yu, Vladlen Koltun. In ICLR 2016.



Visualizing Networks
Two of the best “visualization” papers were “Do Neural Networks Learn the same thing?” by
Jason Yosinski (now going to Geometric Intelligence, Inc.) and “Visualizing and Understanding Recurrent Networks” presented by Andrej Karpathy (now going to OpenAI). Yosinski presented his work on studying what happens when you learn two different networks using different initializations. Do the nets learn the same thing? I remember a great conversation with Jason about figuring out if the neurons in network A can be represented as linear combinations of network B, and his visualizations helped make the case. Andrej’s visualizations of recurrent networks are best consumed in presentation/blog form[2]. For those of you that haven’t yet seen Andrej’s analysis of Recurrent Nets on Hacker News, check it out here.


Convergent Learning: Do different neural networks learn the same representations? Yixuan Li, Jason Yosinski, Jeff Clune, Hod Lipson, John Hopcroft. In ICLR 2016. See Yosinski's video here.


Visualizing and Understanding Recurrent Networks Andrej Karpathy, Justin Johnson, Li Fei-Fei. In ICLR 2016.

Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)? 
Figure from Do Nets have to be Deep?
This was the key question asked in the paper presented by Rich Caruana. (Dr. Caruana is now at Microsoft, but I remember meeting him at Cornell eleven years ago) Their papers' two key results which are quite meaningful if you sit back and think about them. First, there is something truly special about convolutional layers that when applied to images, they are significantly better than using solely fully connected layers -- there’s something about the 2D structure of images and the 2D structures of filters that makes convolutional layers get a lot of value out of their parameters. Secondly, we now have teacher-student training algorithms which you can use to have a shallower network “mimic” the teacher’s responses on a large dataset. These shallower networks are able to learn much better using a teacher and in fact, such shallow networks produce inferior results when the are trained on the teacher’s training set.  So it seems you get go [Data to MegaDeep], and [MegaDeep to MiniDeep], but you cannot directly go from [Data to MiniDeep].


Do Deep Convolutional Nets Really Need to be Deep (Or Even Convolutional)? Gregor Urban, Krzysztof J. Geras, Samira Ebrahimi Kahou, Ozlem Aslan, Shengjie Wang, Rich Caruana, Abdelrahman Mohamed, Matthai Philipose, Matt Richardson. In ICLR 2016.


Another interesting idea on the [MegaDeep to MiniDeep] and [MiniDeep to MegaDeep] front,



Net2Net: Accelerating Learning via Knowledge Transfer Tianqi Chen, Ian Goodfellow, Jonathon Shlens. In ICLR 2016.


Language Modeling with LSTMs
There was also considerable focus on methods that deal with large bodies of text. Chris Dyer (who is supposedly also going to DeepMind), gave a keynote asking the question “Should Model Architecture Reflect Linguistic Structure?” See Chris Dyer's Keynote video here. Some of his key take-aways from comparing word-level embedding vs character-level embeddings is that for different languages, different methods work better.  For languages which have a rich syntax, character-level encodings outperform word-level encodings.

Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs Miguel Ballesteros, Chris Dyer, Noah A. Smith. In Proceedings of EMNLP 2015.

An interesting approach, with a great presentation by Ivan Vendrov, was “Order-Embeddings of Images and Language" by Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun which showed a great intuitive coordinate-system-y way for thinking about concepts. I really love these coordinate system analogies and I’m all for new ways of thinking about classical problems.



Order-Embeddings of Images and Language Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun. In ICLR 2016. See Video here.



Training-Free Methods: Brain-dead applications of CNNs to Image Matching

These techniques use the activation maps of deep neural networks trained on an ImageNet classification task for other important computer vision tasks. These techniques employ clever ways of matching image regions and from the following ICLR paper, are applied to smart image retrieval.

Particular object retrieval with integral max-pooling of CNN activations. Giorgos Tolias, Ronan Sicre, Hervé Jégou. In ICLR 2016.

This reminds me of the RSS 2015 paper which uses ConvNets to match landmarks for a relocalization-like SLAM task.


Place Recognition with ConvNet Landmarks: Viewpoint-Robust, Condition-Robust, Training-Free. Niko Sunderhauf, Sareh Shirazi, Adam Jacobson, Feras Dayoub, Edward Pepperell, Ben Upcroft, and Michael Milford. In RSS 2015.


Gaussian Processes and Auto Encoders

Gaussian Processes used to be quite popular at NIPS, sometimes used for vision problems, but mostly “forgotten” in the era of Deep Learning. VAEs or Variational Auto Encoders used to be much more popular when pertaining was the only way to train deep neural nets. However, with new techniques like adversarial networks, people keep revisiting Auto Encoders, because we still “hope” that something as simple as an encoder / decoder network should give us the unsupervised learning power we all seek, deep down inside. VAEs got quite a lot of action but didn't make the cut for today's blog post.

Geometric Methods
Overall, very little content pertaining to the SfM / SLAM side of the vision problem was present at ICLR 2016. This kind of work is very common at CVPR, and it's a bit of a surprise that there wasn't a lot of Robotics work at ICLR. It should be noted that the techniques used in SfM/SLAM are more based on multiple-view geometry and linear algebra than the data-driven deep learning of today.

Perhaps a better venue for Robotics and Deep Learning will be the June 2016 workshop titled Are the Sceptics Right? Limits and Potentials of Deep Learning in Robotics. This workshop is being held at RSS 2016, one of the world's leading Robotics conferences.

Part III: Quo Vadis Deep Learning?

Neural Net Compression is going to be big -- real-world applications demand it. The algos guys aren't going to wait for TPU and VPUs to become mainstream. Deep Nets which can look at a picture and tell you what’s going on are going to be inside every single device which has a camera. In fact, I don’t see any reason why all cameras by 2020 won’t be able to produce a high-quality RGB image as well as a neural network response vector. New image formats will even have such “deep interpretation vectors” directly saved alongside the image. And it's all going to be a neural net, in one shape or another.

OpenAI had a strong presence at ICLR 2016, and I feel like every week a new PhD joins OpenAI. Google DeepMind and Facebook FAIR had a large number of papers. Google demoed a real-time version of deep-learning based style transfer using TensorFlow. Microsoft is no longer King of research. Startups were giving out little toys -- Clarifai even gave out free sandals. Graduates with well-tuned Deep Learning skills will continue being in high-demand, but once the next generation of AI-driven startups emerge, it is only those willing to transfer their academic skills into a product world-facing focus, aka the upcoming wave of deep entrepreneurs, that will make serious $$$.

Research-wise, arXiv is a big productivity booster. Hopefully, now you know where to place your future deep learning research bets, have enough new insights to breath some inspiration into your favorite research problem, and you've gotten a taste of where the top researchers are heading. I encourage you to turn off your computer and have a white-board conversation with your colleagues about deep learning. Grab a friend, teach him some tricks.

I'll see you all at CVPR 2016. Until then, keep learning.

Related computervisionblog.com Blog Posts

Why your lab needs a reading group. May 2012
ICCV 2015: 21 Hottest Research Papers December 2015
Deep Down the Rabbit Hole: CVPR 2015 and Beyond June 2015
The Deep Learning Gold Rush of 2015 November 2015
Deep Learning vs Machine Learning vs Pattern Recognition March 2015
Deep Learning vs Probabilistic Graphical Models April 2015
Future of Real-time SLAM and "Deep Learning vs SLAM" January 2016

Relevant Outside Links

[1] Welcome to the AI Conspiracy: The 'Canadian Mafia' Behind Tech's Latest Craze @ <re/code>
[2] The Unreasonable Effectiveness of Recurrent Neural Networks @ Andrej Karpathy's Blog
[3] Deep Learning for Robots: Learning from Large-Scale Interaction. @ Google Research Blog


          The Future of Real-Time SLAM and Deep Learning vs SLAM        
Last month's International Conference of Computer Vision (ICCV) was full of Deep Learning techniques, but before we declare an all-out ConvNet victory, let's see how the other "non-learning" geometric side of computer vision is doing.  Simultaneous Localization and Mapping, or SLAM, is arguably one of the most important algorithms in Robotics, with pioneering work done by both computer vision and robotics research communities.  Today I'll be summarizing my key points from ICCV's Future of Real-Time SLAM Workshop, which was held on the last day of the conference (December 18th, 2015).

Today's post contains a brief introduction to SLAM, a detailed description of what happened at the workshop (with summaries of all 7 talks), and some take-home messages from the Deep Learning-focused panel discussion at the end of the session.

SLAM visualizations. Can you identify any of these SLAM algorithms?

Part I: Why SLAM Matters

Visual SLAM algorithms are able to simultaneously build 3D maps of the world while tracking the location and orientation of the camera (hand-held or head-mounted for AR or mounted on a robot). SLAM algorithms are complementary to ConvNets and Deep Learning: SLAM focuses on geometric problems and Deep Learning is the master of perception (recognition) problems. If you want a robot to go towards your refrigerator without hitting a wall, use SLAM. If you want the robot to identify the items inside your fridge, use ConvNets.



Basics of SfM/SLAM: From point observation and intrinsic camera parameters, the 3D structure of a scene is computed from the estimated motion of the camera. For details, see openMVG website.

SLAM is a real-time version of Structure from Motion (SfM). Visual SLAM or vision-based SLAM is a camera-only variant of SLAM which forgoes expensive laser sensors and inertial measurement units (IMUs). Monocular SLAM uses a single camera while non-monocular SLAM typically uses a pre-calibrated fixed-baseline stereo camera rig. SLAM is prime example of a what is called a "Geometric Method" in Computer Vision. In fact, CMU's Robotics Institute splits the graduate level computer vision curriculum into a Learning-based Methods in Vision course and a separate Geometry-Based Methods in Vision course.


Structure from Motion vs Visual SLAM
Structure from Motion (SfM) and SLAM are solving a very similar problem, but while SfM is traditionally performed in an offline fashion, SLAM has been slowly moving towards the low-power / real-time / single RGB camera mode of operation. Many of the today’s top experts in Structure from Motion work for some of the world’s biggest tech companies, helping make maps better. Successful mapping products like Google Maps could not have been built without intimate knowledge of multiple-view geometry, SfM, and SLAM.  A typical SfM problem is the following: given a large collection of photos of a single outdoor structure (like the Colliseum), construct a 3D model of the structure and determine the camera's poses. The image collection is processed in an offline setting, and large reconstructions can take anywhere between hours and days. 


SfM SoftwareBundler is one of the most successful SfM open source libraries

Here are some popular SfM-related software libraries:

Visual SLAM vs Autonomous Driving
While self-driving cars are one of the most important applications of SLAM, according to Andrew Davison, one of the workshop organizers, SLAM for Autonomous Vehicles deserves its own research track. (And as we'll see, none of the workshop presenters talked about self-driving cars). For many years to come it will make sense to continue studying SLAM from a research perspective, independent of any single Holy-Grail application. While there are just too many system-level details and tricks involved with autonomous vehicles, research-grade SLAM systems require very little more than a webcam, knowledge of algorithms, and elbow grease. As a research topic, Visual SLAM is much friendlier to thousands of early-stage PhD students who’ll first need years of in-lab experience with SLAM before even starting to think about expensive robotic platforms such as self-driving cars.



Google's Self-Driving Car's perception system. From IEEE Spectrum's "How Google's Self-Driving Car Works"

Related: March 2015 blog post, Mobileye's quest to put Deep Learning inside every new car.
Related: One way Google's Cars Localize Themselves

Part II: The Future of Real-time SLAM

Now it's time to officially summarize and comment on the presentations from The Future of Real-time SLAM workshop. Andrew Davison started the day with an excellent historical overview of SLAM called 15 years of vision-based SLAM, and his slides have good content for an introductory robotics course.

For those of you who don’t know Andy, he is the one and only Professor Andrew Davison of Imperial College London.  Most known for his 2003 MonoSLAM system, he was one of the first to show how to build SLAM systems from a single “monocular” camera at a time when just everybody thought you needed a stereo “binocular” camera rig. More recently, his work has influenced the trajectory of companies such as Dyson and the capabilities of their robotic systems (e.g., the brand new Dyson360).

I remember Professor Davidson from the Visual SLAM tutorial he gave at the BMVC Conference back in 2007. Surprisingly very little has changed in SLAM compared to the rest of the machine-learning heavy work being done at the main vision conferences. In the past 8 years, object recognition has undergone 2-3 mini revolutions, while today's SLAM systems don't look much different than they did 8 years ago. The best way to see the progress of SLAM is to take a look at the most successful and memorable systems. In Davison’s workshop introduction talk, he discussed some of these exemplary systems which were produced by the research community over the last 10-15 years: 

  • MonoSLAM
  • PTAM
  • FAB-MAP
  • DTAM
  • KinectFusion

Davison vs Horn: The next chapter in Robot Vision
Davison also mentioned that he is working on a new Robot Vision book, which should be an exciting treat for researchers in computer vision, robotics, and artificial intelligence. The last Robot Vision book was written by B.K. Horn (1986), and it’s about time for an updated take on Robot Vision. 

A new robot vision book?

While I’ll gladly read a tome that focuses on the philosophy of robot vision, personally I would like the book to focus on practical algorithms for robot vision, like the excellent Multiple View Geometry book by Hartley and Zissermann or Probabilistic Robotics by Thrun, Burgard, and Fox. A "cookbook" of visual SLAM problems would be a welcome addition to any serious vision researcher's collection.

Related: Davison's 15-years of vision-based SLAM slides

Talk 1: Christian Kerl on Continuous Trajectories in SLAM
The first talk, by Christian Kerl, presented a dense tracking method to estimate a continuous-time trajectory. The key observation is that most SLAM systems estimate camera poses at a discrete number of time steps (either they key frames which are spaced several seconds apart, or the individual frames which are spaced approximately 1/25s apart). 


Continuous Trajectories vs Discrete Time Points. SLAM/SfM usually uses discrete time points, but why not go continuous?

Much of Kerl’s talk was focused on undoing the damage of rolling shutter cameras, and the system demo’ed by Kerl paid meticulous attention to modeling and removing these adverse rolling shutter effects.

Undoing the damage of rolling shutter in Visual SLAM.


Related: Kerl's Dense continous-time tracking and mapping slides.
Related: Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras (C. Kerl, J. Stueckler, D. Cremers), In IEEE International Conference on Computer Vision (ICCV), 2015. [pdf]

Talk 2: Semi-Dense Direct SLAM by Jakob Engel
LSD-SLAM came out at ECCV 2014 and is one of my favorite SLAM systems today! Jakob Engel was there to present his system and show the crowd some of the coolest SLAM visualizations in town. LSD-SLAM is an acronym for Large-Scale Direct Monocular SLAM. LSD-SLAM is an important system for SLAM researchers because it does not use corners or any other local features. Direct tracking is performed by image-to-image alignment using a coarse-to-fine algorithm with a robust Huber loss. This is quite different than the feature-based systems out there. Depth estimation uses an inverse depth parametrization (like many other SLAM systems) and uses a large number or relatively small baseline image pairs. Rather than relying on image features, the algorithms is effectively performing “texture tracking”. Global mapping is performed by creating and solving a pose graph "bundle adjustment" optimization problem, and all of this works in real-time. The method is semi-dense because it only estimates depth at pixels solely near image boundaries. LSD-SLAM output is denser than traditional features, but not fully dense like Kinect-style RGBD SLAM.


LSD-SLAM in Action: LSD-SLAM generates both a camera trajectory and a semi-dense 3D scene reconstruction. This approach works in real-time, does not use feature points as primitives, and performs direct image-to-image alignment.

Engel gave us an overview of the original LSD-SLAM system as well as a handful of new results, extending their initial system to more creative applications and to more interesting deployments. (See paper citations below)

Related: LSD-SLAM Open-Source Code on github LSD-SLAM project webpage
Related: LSD-SLAM: Large-Scale Direct Monocular SLAM (J. Engel, T. Schöps, D. Cremers), In European Conference on Computer Vision (ECCV), 2014. [pdf] [youtube video]

An extension to LSD-SLAM, Omni LSD-SLAM was created by the observation that the pinhole model does not allow for a large field of view. This work was presented at IROS 2015 (Caruso is first author) and allows a large field of view (ideally more than 180 degrees). From Engel’s presentation it was pretty clear that you can perform ballerina-like motions (extreme rotations) while walking around your office and holding the camera. This is one of those worst-case scenarios for narrow field of view SLAM, yet works quite well in Omni LSD-SLAM.

Omnidirectional LSD-SLAM Model. See Engel's Semi-Dense Direct SLAM presentation slides.

Related: Large-Scale Direct SLAM for Omnidirectional Cameras (D. Caruso, J. Engel, D. Cremers), In International Conference on Intelligent Robots and Systems (IROS), 2015.  [pdf] [youtube video]

Stereo LSD-SLAM is an extension of LSD-SLAM to a binocular camera rig. This helps in getting the absolute scale, initialization is instantaneous, and there are no issues with strong rotation. While monocular SLAM is very exciting from an academic point of view, if your robot is a 30,000$ car or 10,000$ drone prototype, you should have a good reason to not use a two+ camera rig. Stereo LSD-SLAM performs quite competitively on SLAM benchmarks.


Stereo LSD-SLAM. Excellent results on KITTI vehicle-SLAM dataset.

Stereo LSD-SLAM is quite practical, optimizes a pose graph in SE(3), and includes a correction for auto exposure. The goal of auto-exposure correcting is to make the error function invariant to affine lighting changes. The underlying parameters of the color-space affine transform are estimated during matching, but thrown away to estimate the image-to-image error. From Engel's talk, outliers (often caused by over-exposed image pixels) tend to be a problem, and much care needs to be taken to care of their effects.

Related: Large-Scale Direct SLAM with Stereo Cameras (J. Engel, J. Stueckler, D. Cremers), In International Conference on Intelligent Robots and Systems (IROS), 2015.  [pdf] [youtube video]

Later in his presentation, Engel gave us a sneak peak on new research about integrating both stereo and inertial sensors. For details, you’ll have to keep hitting refresh on Arxiv or talk to Usenko/Engel in person. On the applications side, Engel's presentation included updated videos of an Autonomous Quadrotor driven by LSD-SLAM. The flight starts with an up-down motion to get the scale estimate and a free-space octomap is used to estimate the free-space so that the quadrotor can navigate space on its own. Stay tuned for an official publication...
Quadrotor running Stereo LSD-SLAM. 

The story of LSD-SLAM is also the story of feature-based vs direct-methods and Engel gave both sides of the debate a fair treatment. Feature-based methods are engineered to work on top of Harris-like corners, while direct methods use the entire image for alignment. Feature-based methods are faster (as of 2015), but direct methods are good for parallelism. Outliers can be retroactively removed from feature-based systems, while direct methods are less flexible w.r.t. outliners. Rolling shutter is a bigger problem for direct methods and it makes sense to use a global shutter or a rolling shutter model (see Kerl’s work). Feature-based methods require making decisions using incomplete information, but direct methods can use much more information. Feature-based methods have no need for good initialization and direct-based methods need some clever tricks for initialization. There is only about 4 years of research on direct methods and 20+ on sparse methods. Engel is optimistic that direct methods will one day rise to the top, and so am I.


Feature-based vs direct methods of building SLAM systems. Slide from Engel's talk.

At the end of Engel's presentation, Davison asked about semantic segmentation and Engel wondered whether semantic segmentation can be performed directly on semi-dense "near-image-boundary" data.  However, my personal opinion is that there are better ways to apply semantic segmentation to LSD-like SLAM systems. Semi-dense SLAM can focus on geometric information near boundaries, while object recognition can focus on reliable semantics away from the same boundaries, potentially creating a hybrid geometric/semantic interpretation of the image.

Related: Engel's Semi-Dense Direct SLAM presentation slides

Talk 3: Sattler on The challenges of Large-Scale Localization and Mapping
Torsten Sattler gave a talk on large-scale localization and mapping. The motivation for this work is to perform 6-dof localization inside an existing map, especially for mobile localization. One of the key points in the talk was that when you are using traditional feature-based methods, storing your descriptors soon becomes very costly. Techniques such as visual vocabularies (remember product quantization?) can significantly reduce memory overhead, and with clever optimization at some point storing descriptors no longer becomes the memory bottleneck.

Another important take-home message from Sattler’s talk is that the number of inliers is not actually a good confidence measure for camera pose estimation.  When the feature point are all concentrated in a single part of the image, camera localization can be kilometers away! A better measure of confidence is the “effective inlier count” which looks at the area spanned by the inliers as a fraction of total image area.  What you really want is feature matches from all over the image — if the information is spread out across the image you get a much better pose estimate.

Sattler’s take on the future of real-time slam is the following: we should focus on compact map representations, we should get better at understanding camera pose estimate confidences (like down-weighing features from trees), we should work on more challenging scenes (such as worlds with planar structures and nighttime localization against daytime maps).


Mobile Localisation: Sattler's key problem is localizing yourself inside a large city with a single smartphone picture


Related: Scalable 6-DOF Localization on Mobile Devices. Sven Middelberg, Torsten Sattler, Ole Untzelmann, Leif Kobbelt. In ECCV 2014. [pdf]
Related: Torsten Sattler 's The challenges of large-scale localisation and mapping slides

Talk 4: Mur-Artal on Feature-based vs Direct-Methods
Raúl Mur-Artal, the creator of ORB-SLAM, dedicated his entire presentation to the Feature-based vs Direct-method debate in SLAM and he's definitely on the feature-based side. ORB-SLAM is available as an open-source SLAM package and it is hard to beat. During his evaluation of ORB-SLAM vs PTAM it seems that PTAM actually fails quite often (at least on the TUM RGB-D benchmark). LSD-SLAM errors are also much higher on the TUM RGB-D benchmark than expected.

Feature-Based SLAM vs Direct SLAM. See Mur-Artal's Should we still do sparse feature based SLAM? presentation slides

Related: Mur-Artal's Should we still do sparse-feature based SLAM? slides
Related: Monocular ORB-SLAM R. Mur-Artal, J. M. M. Montiel and J. D. Tardos. A versatile and Accurate Monocular SLAM System. IEEE Transactions on Robotics. 2015 [pdf]
Related: ORB-SLAM Open-source code on github, Project Website

Talk 5: Project Tango and Visual loop-closure for image-2-image constraints
Simply put, Google's Project Tango is the world' first attempt at commercializing SLAM. Simon Lynen from Google Zurich (formerly ETH Zurich) came to the workshop with a Tango live demo (on a tablet) and a presentation on what's new in the world of Tango. In case you don't already know, Google wants to put SLAM capabilities into the next generation of Android Devices. 




Google's Project Tango needs no introduction.

The Project Tango presentation discussed a new way of doing loop closure by finding certain patters in the image-to-image matching matrix. This comes from the “Placeless Place Recognition” work. They also do online bundle adjustment w/ vision-based loop closure.


Loop Closure inside a Project Tango? Lynen et al's Placeless Place Recognition. The image-to-image matrix reveals a new way to look for loop-closure. See the algorithm in action in this youtube video.

The Project Tango folks are also working on combing multiple crowd-sourced maps at Google, where the goals to combine multiple mini-maps created by different people using Tango-equipped devices.

Simon showed a video of mountain bike trail tracking which is actually quite difficult in practice. The idea is to go down a mountain bike trail using a Tango device and create a map, then the follow-up goal is to have a separate person go down the trail. This currently “semi-works” when there are a few hours between the map building and the tracking step, but won’t work across weeks/months/etc. 

During the Tango-related discussion, Richard Newcombe pointed out that the “features” used by Project Tango are quite primitive w.r.t. getting a deeper understanding of the environment, and it appears that Project Tango-like methods won't work on outdoor scenes where the world is plagued by non-rigidity, massive illumination changes, etc.  So are we to expect different systems being designed for outdoor systems or will Project Tango be an indoor mapping device?

Related: Placeless Place Recognition. Lynen, S. ; Bosse, M. ; Furgale, P. ; Siegwart, R. In 3DV 2014.

Talk 6: ElasticFusion is DenseSLAM without a pose-graph
ElasticFusion is a dense SLAM technique which requires a RGBD sensor like the Kinect. 2-3 minutes to obtain a high-quality 3D scan of a single room is pretty cool. A pose-graph is used behind the scenes of many (if not most) SLAM systems, and this technique has a different (map-centric) approach. The approach focuses on building a map, but the trick is that the map is deformable, hence the name ElasticFusion. The “Fusion” part of the algorithm is in homage to KinectFusion which was one of the first high quality kinect-based reconstruction pipelines. Also surfels are used as the underlying primitives.


Image from Kintinuous, an early version of Whelan's Elastic Fusion.


Recovering light sources: we were given a sneak peak at new unpublished work from Imperial College London / dyson Robotics Lab. The idea is that detecting the light source direction and detecting specularities, you can improve 3D reconstruction results. Cool videos of recovering light source locations which work for up to 4 separate lights.

Related: Map-centric SLAM with ElasticFusion presentation slides
Related: ElasticFusion: Dense SLAM Without A Pose Graph. Whelan, Thomas and Leutenegger, Stefan and Salas-Moreno, Renato F and Glocker, Ben and Davison, Andrew J. In RSS 2015.

Talk 7: Richard Newcombe’s DynamicFusion
Richard Newcombe's (whose recently formed company was acquired by Oculus), was the last presenter.  It's really cool to see the person behind DTAM, KinectFusion, and DynamicFusion now working in the VR space.


Newcombe's Dynamic Fusion algorithm. The technique won the prestigious CVPR 2015 best paper award, and to see it in action just take a look at the authors' DynamicFusion Youtube video.


RelatedDynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-Time, Richard A. Newcombe, Dieter Fox, Steven M. Seitz. In CVPR 2015. [pdf] [Best-Paper winner]
Related: SLAM++: Simultaneous Localisation and Mapping at the Level of Objects Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly and Andrew J. Davison (CVPR 2013)
Related: KinectFusion: Real-Time Dense Surface Mapping and Tracking Richard A. Newcombe Shahram Izadi,Otmar Hilliges, David Molyneaux, David Kim, Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Andrew Fitzgibbon (ISMAR 2011, Best paper award!)


Workshop Demos
During the demo sessions (held in the middle of the workshop), many of the presenter showed off their SLAM systems in action. Many of these systems are available as open-source (free for non-commercial use?) packages, so if you’re interested in real-time SLAM, downloading the code is worth a shot. However, the one demo which stood out was Andrew Davison’s showcase of his MonoSLAM system from 2004. Andy had to revive his 15-year old laptop (which was running Redhat Linux) to show off his original system, running on the original hardware. If the computer vision community is going to oneway decide on a “retro-vision” demo session, I’m just going to go ahead and nominate Andy for the best-paper prize, right now.


Andry's Retro-Vision SLAM Setup (Pictured on December 18th, 2015)


It was interesting to watch the SLAM system experts wave their USB cameras around, showing their systems build 3D maps of the desk-sized area around their laptops.  If you carefully look at the way these experts move the camera around (i.e., smooth circular motions), you can almost tell how long a person has been working with SLAM. When the non-experts hold the camera, probability of tracking failure is significantly higher.

I had the pleasure of speaking with Andy during the demo session, and I was curious which line of work (in the past 15 years) surprised him the most. His reply was that PTAM, which showed how to perform real-time bundle adjustment, surprised him the most. The PTAM system was essentially a MonoSLAM++ system, but the significantly improved tracking results were due to taking a heavyweight algorithm (bundle adjustment) and making it real-time — something which Andy did not believe was possible in the early 2000s.

Part III: Deep Learning vs SLAM

The SLAM panel discussion was a lot of fun. Before we jump to the important Deep Learning vs SLAM discussion, I should mention that each of the workshop presenters agreed that semantics are necessary to build bigger and better SLAM systems. There were lots of interesting mini-conversations about future directions. During the debates, Marc Pollefeys (a well-known researcher in SfM and Multiple-View Geometry) reminded everybody that Robotics is the killer application of SLAM and suggested we keep an eye on the prize. This is quite surprising since SLAM was traditionally applied to Robotics problems, but the lack of Robotics success in the last few decades (Google Robotics?) has shifted the focus of SLAM away from Robots and towards large-scale map building (ala Google Maps) and Augmented Reality. Nobody at this workshop talked about Robots.

Integrating semantic information into SLAM
There was a lot of interest in incorporating semantics into today’s top-performing SLAM systems. When it comes to semantics, the SLAM community is unfortunately stuck in the world of bags-of-visual-words, and doesn't have new ideas on how to integrate semantic information into their systems. On the other end, we’re now seeing real-time semantic segmentation demos (based on ConvNets) popping up at CVPR/ICCV/ECCV, and in my opinion SLAM needs Deep Learning as much as the other way around.


Integrating semantics into SLAM is often talk about, but it is easier said than done.
Figure 6.9 (page 142) from Moreno's PhD thesis: Dense Semantic SLAM

"Will end-to-end learning dominate SLAM?"
Towards the end of the SLAM workshop panel, Dr. Zeeshan Zia asked a question which startled the entire room and led to a memorable, energy-filled discussion. You should have seen the look on the panel’s faces. It was a bunch of geometers being thrown a fireball of deep learning. Their facial expressions suggest both bewilderment, anger, and disgust. "How dare you question us?" they were thinking. And it is only during these fleeting moments that we can truly appreciate the conference experience. Zia's question was essentially: Will end-to-end learning soon replace the mostly manual labor involved in building today’s SLAM systems?

Zia's question is very important because end-to-end trainable systems have been slowly creeping up on many advanced computer science problems, and there's no reason to believe SLAM will be an exception. A handful of the presenters pointed out that current SLAM systems rely on too much geometry for a pure deep-learning based SLAM system to make sense -- we should use learning to make the point descriptors better, but leave the geometry alone. Just because you can use deep learning to make a calculator, it doesn't mean you should.


Learning Stereo Similarity Functions via ConvNets, by Yan LeCun and collaborators.


While many of the panel speakers responded with a somewhat affirmative "no", it was Newcombe which surprisingly championed what the marriage of Deep Learning and SLAM might look like. 

Newcombe's Proposal: Use SLAM to fuel Deep Learning
Although Newcombe didn’t provide much evidence or ideas on how Deep Learning might help SLAM, he provided a clear path on how SLAM might help Deep Learning.  Think of all those maps that we've built using large-scale SLAM and all those correspondences that these systems provide — isn’t that a clear path for building terascale image-image "association" datasets which should be able to help deep learning? The basic idea is that today's SLAM systems are large-scale "correspondence engines" which can be used to generate large-scale datasets, precisely what needs to be fed into a deep ConvNet.

Concluding Remarks
There is quite a large disconnect between the kind of work done at the mainstream ICCV conference (heavy on machine learning) and the kind of work presented at the real-time SLAM workshop (heavy on geometric methods like bundle adjustment). The mainstream Computer Vision community has witnessed several mini-revolutions within the past decade (e.g., Dalal-Triggs, DPM, ImageNet, ConvNets, R-CNN) while the SLAM systems of today don’t look very different than they did 8 years ago. The Kinect sensor has probably been the single largest game changer in SLAM, but the fundamental algorithms remain intact.
Integrating semantic information: The next frontier in Visual SLAM. 
Brain image from Arwen Wallington's blog post.

Today’s SLAM systems help machines geometrically understand the immediate world (i.e., build associations in a local coordinate system) while today’s Deep Learning systems help machines reason categorically (i.e., build associations across distinct object instances). In conclusion, I share Newcombe and Davison excitement in Visual SLAM, as vision-based algorithms are going to turn Augmented and Virtual Reality into billion dollar markets. However, we should not forget to keep our eyes on the "trillion-dollar" market, the one that's going to redefine what it means to "work" -- namely Robotics. The day of Robot SLAM will come soon.

          Deep Learning vs Big Data: Who owns what?        
In order to learn anything useful, large-scale multi-layer deep neural networks (aka Deep Learning systems) require a large amount of labeled data. There is clearly a need for big data, but only a few places where big visual data is available. Today we'll take a look at one of the most popular sources of big visual data, peek inside a trained neural network, and ask ourselves some data/model ownership questions. The fundamental question to keep in mind is the following, "Are the learned weights of a neural network derivate works of the input images?" In other words, when deep learning touches your data, who owns what?



Background: The Deep Learning "Computer Vision Recipe"
One of today's most successful machine learning techniques is called Deep Learning. The broad interest in Deep Learning is backed by some remarkable results on real-world data interpretation tasks dealing with speech[1], text[2], and images[3]. Deep learning and object recognition techniques have been pioneered by academia (University of Toronto, NYU, Stanford, Berkeley, MIT, CMU, etc), picked up by industry (Google, Facebook, Snapchat, etc), and are now fueling a new generation of startups ready to bring visual intelligence to the masses (Clarifai.com, Metamind.io, Vision.ai, etc). And while it's still not clear where Artificial Intelligence is going, Deep Learning will be a key player.

Related blog postDeep Learning vs Machine Learning vs Pattern Recognition
Related blog postDeep Learning vs Probabilistic Graphical Models vs Logic

For visual object recognition tasks, the most popular models are Convolutional Neural Networks (also known as ConvNets or CNNs). They can be trained end-to-end without manual feature engineering, but this requires a large set of training images (sometimes called big data, or big visual data). These large neural networks start out as a Tabula Rasa (or "blank slate") and the full system is trained in an end-to-end fashion using a heavily optimized implementation of Backpropagation (informally called "backprop"). Backprop is nothing but the chain rule you learned in Calculus 101 and today's deep neural networks are trained in almost the same way they were trained in the 1980s. But today's highly-optimized implementations of backprop are GPU-based and can process orders of magnitude more data than was approachable in the pre-internet pre-cloud pre-GPU golden years of Neural Networks. The output of the deep learning training procedure is a set of learned weights for the different layers defined in the model architecture -- millions of floating point numbers representing what was learned from the images. So what's so interesting about the weights? It's the relationship between the weights and the original big data, that will be under scrutiny today.

"Are weights of a trained network based on ImageNet a derived work, a cesspool of millions of copyright claims? What about networks trained to approximate another ImageNet network?"
[This question was asked on HackerNews by kastnerkyle in the comments of A Revolutionary Technique That Changed Machine Vision.]

In the context of computer vision, this question truly piqued my interest, and as we start seeing robots and AI-powered devices enter our homes I expect much more serious versions of this question to arise in the upcoming decade. Let's see how some of these questions are being addressed in 2015.

1. ImageNet: Non-commercial Big Visual Data

Let's first take a look at the most common data source for Deep Learning systems designed to recognize a large number of different objects, namely ImageNet[4]. ImageNet is the de-facto source of big visual data for computer vision researchers working on large scale object recognition and detection. The dataset debuted in a 2009 CVPR paper by Fei-Fei Li's research group and was put in place to replace both PASCAL datasets (which lacked size and variety) and LabelMe datasets (which lacked standardization). ImageNet grew out of Caltech101 (a 2004 dataset focusing on image categorization, also pioneered by Fei-Fei Li) so personally I still think of ImageNet as something like "Stanford10^N". ImageNet has been a key player in organizing the scale of data that was required to push object recognition to its new frontier, the deep learning phase.

ImageNet has over 15 million images in its database as of May 1st, 2015.


Problem: Lots of extremely large datasets are mined from internet images, but these images often come with their own copyright.  This prevents collecting and selling such images, and from a commercial point of view, when creating such a dataset, some care has to be taken.  For research to keep pushing the state-of-the-art on real-world recognition problems, we have to use standard big datasets (representative of what is found in the real-world internet), foster a strong sense of community centered around sharing results, and maintain the copyrights of the original sources.

Solution: ImageNet decided to publicly provide links to the dataset images so that they can be downloaded without having to be hosted on an University-owned server. The ImageNet website only serves the image thumbnails and provides a copyright infringement clause together with instructions where to file a DMCA takedown notice. The dataset organizers provide the entire dataset only after signing a terms of access, prohibiting commercial use. See the ImageNet clause below (taken on May 5th, 2015).

"ImageNet does not own the copyright of the images. ImageNet only provides thumbnails and URLs of images, in a way similar to what image search engines do. In other words, ImageNet compiles an accurate list of web images for each synset of WordNet. For researchers and educators who wish to use the images for non-commercial research and/or educational purposes, we can provide access through our site under certain conditions and terms."

2. Caffe: Unrestricted Use Deep Learning Models

Now that we have a good idea where to download big visual data and an understanding of the terms that apply, let's take a look at the the other end of the spectrum: the output of the Deep Learning training procedure. We'll take a look at Caffe, one of the most popular Deep Learning libraries, which was engineered to handle ImageNet-like data.  Caffe provides an ecosystem for sharing models (the Model Zoo), and is becoming an indispensable tool for today's computer vision researcher. Caffe is developed at the Berkeley Vision and Learning Center (BVLC) and by community contributors -- it is open source.

Problem: As a project that started at a University, Caffe's goal is to be the de-facto standard for creating, training, and sharing Deep Learning models. The shared models were initially licensed for non-commercial use, but the problem is that a new wave of startups is using these techniques, so there must be a licensing agreement which allows Universities, large companies, and startups to explore the same set of pretrained models.

Solution: The current model licensing for Caffe is unrestricted use. This is really great for a broad range of hackers, scientists, and engineers.  The models used to be shared with a non-commercial clause. Below is the entire model licensing agreement from the Model License section of Caffe (taken on May 5th, 2015).

"The Caffe models bundled by the BVLC are released for unrestricted use. 

These models are trained on data from the ImageNet project and training data includes internet photos that may be subject to copyright. 

Our present understanding as researchers is that there is no restriction placed on the open release of these learned model weights, since none of the original images are distributed in whole or in part. To the extent that the interpretation arises that weights are derivative works of the original copyright holder and they assert such a copyright, UC Berkeley makes no representations as to what use is allowed other than to consider our present release in the spirit of fair use in the academic mission of the university to disseminate knowledge and tools as broadly as possible without restriction." 

3. Vision.ai: Dataset generation and training in your home 

Deep Learning learns a summary of the input data, but what happens if a different kind of model memorizes bits and pieces of the training data? And more importantly what if there are things inside the memorized bits which you might not want shared with outsiders?  For this case study, we'll look at Vision.ai, and their real-time computer vision server which is designed to simultaneously create a dataset and learn about an object's appearance. Vision.ai software can be applied to real-time training from videos as well as live webcam streams.

Instead of starting with big visual data collected from internet images (like ImageNet), the vision.ai training procedure is based on a person waving an object of interest in front of the webcam. The user bootstraps the learning procedure with an initial bounding box, and the algorithm continues learning hands-free. As the algorithm learns, it is stores a partial history of what it previously saw, effectively creating its own dataset on the fly. Because the vision.ai convolutional neural networks are designed for detection (where an object only occupies a small portion of the image), there is a large amount of background data presented inside the collected dataset. At the end of the training procedure you get both the Caffe-esque bit (the learned weights) and the ImageNet bit (the collected images). So what happens when it's time to share the model?

A user training a cup detector using vision.ai's real-time detector training interface


Problem: Training in your home means that potentially private and sensitive information is contained inside the backgrounds of the collected images. If you train in your home and make the resulting object model public, think twice about what you're sharing. Sharing can also be problematic if you have trained an object detector from a copyrighted video/images and want to share/sell the resulting model.

Solution: When you save a vision.ai model to disk, you get both a compiled model and the full model. The compiled model is the full model sans the images (thus much smaller). This allows you to maintain fully editable models on your local computer, and share the compiled model (essentially only the learned weights), without the chance of anybody else peeking into your living room. Vision.ai's computer vision server called VMX can run both compiled and uncompiled models; however, only uncompiled models can be edited and extended. In addition, vision.ai provides their vision server as a standalone install, so that all of the training images and computations can reside on your local computer. In brief, vision.ai's solution is to allow you to choose whether you want to run the computations in the cloud or locally, and whether you want to distribute full models (with background images) or the compiled models (solely what is required for detection). When it comes to sharing the trained models and/or created datasets, you are free to choose your own licensing agreement.

4. Open Problems for Licensing Memory-based Machine Learning Models

Deep Learning methods aren't the only techniques applicable to object recognition. What if our model was a Nearest-Neighbor classifier using raw RGB pixels? A Nearest Neighbor Classifier is a memory based classifier which memorizes all of the training data -- the model is the training data. It would be contradictory to license the same set of data differently if one day it was viewed as training data and another day as the output of a learning algorithm. I wonder if there is a way to reconcile the kind of restrictive non-commercial licensing behind ImageNet with the unrestricted licensing use strategy of Caffe Deep Learning Models. Is it possible to have one hacker-friendly data/model license agreement to rule them all?

Conclusion

Don't be surprised if neural network upgrades come as part of your future operating system. As we transition from a data economy (sharing images) to a knowledge economy (sharing neural networks), legal/ownership issues will pop up. I hope that the three scenarios I covered today (big visual data, sharing deep learning models, and training in your home) will help you think about the future legal issues that might come up when sharing visual knowledge. When AI starts generating its own art (maybe by re-synthesizing old pictures), legal issues will pop up. And when your competitor starts selling your models and/or data, legal issues will resurface. Don't be surprised if the MIT license vs. GPL license vs. Apache License debate resurges in the context of pre-trained deep learning models. Who knows, maybe AI Law will become the next big thing.

References
[1] Deep Speech: Accurate Speech Recognition with GPU-Accelerated Deep Learning: NVIDIA dev blog post about Baidu's work on speech recognition using Deep Learning. Andrew Ng is working with Baidu on Deep Learning.

[2] Text Understanding from Scratch: Arxiv paper from Facebook about end-to-end training of text understanding systems using ConvNets. Yann Lecun is working with Facebook on Deep Learning.

[3] ImageNet Classification with Deep Convolutional Neural Networks. Seminal 2012 paper from the Neural Information and Processing Systems (NIPS) conference which showed breakthrough performance from a deep neural network. Paper came out of University of Toronto, but now most of these guys are now at Google.  Geoff Hinton is working with Google on Deep Learning.

[4] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.

Jia Deng is now assistant professor at Michigan University and he is growing his research group. If you're interested in starting a PhD in deep learning and vision, check out his call for prospective students. This might be a younger version of Andrew Ng.

Richard Socher is the CTO and Co-Founder of MetaMind, and new startup in the Deep Learning space. They are VC-backed and have plenty of room to grow.

Jia Li is now Head of Research at Snapchat, Inc. I can't say much, but take a look at the recent VentureBeat article: Snapchat is quietly building a research team to do deep learning on images, videos. Jia and I overlapped at Google Research back in 2008.

Fei-Fei Li is currently the Director of the Stanford Artificial Intelligence Lab and the Stanford Vision Lab. See the article on Wired: If we want our machines to think, we need to teach them to see. Yann, you have some competition.

Yangqing Jia created the Caffe project during his PhD at UC Berkeley. He is now a research scientist at Google.

Tomasz Malisiewicz is the Co-Founder of Vision.ai, which focuses on real-time training of vision systems -- something which is missing in today's Deep Learning systems. Come say hi at CVPR.



          Making Visual Data a First-Class Citizen        
“Above all, don't lie to yourself. The man who lies to himself and listens to his own lie comes to a point that he cannot distinguish the truth within him, or around him, and so loses all respect for himself and for others. And having no respect he ceases to love.” ― Fyodor Dostoyevsky, The Brothers Karamazov


City Forensics: Using Visual Elements to Predict Non-Visual City Attributes

To respect the power and beauty of machine learning algorithms, especially when they are applied to the visual world, let's take a look at three recent applications of learning-based "computer vision" to computer graphics. Researchers in computer graphics are known for producing truly captivating illustrations of their results, so this post is going to be very visual. Now is your chance to sit back and let the pictures do the talking.

Can you predict things simply by looking at street-view images?

Let's say you're going to visit an old-friend in a foreign country for the first time. You've never visited this country before and have no idea what kind of city/neighborhood your friend lives in. So you decide to get a sneak peak -- you enter your friend's address into Google Street View.

Most people can look at Google Street View images in a given location and estimate attributes such as "sketchy," "rural," "slum-like," "noisy" for the given neighborhood. TLDR; A person is a pretty good visual recommendation engine.

Can you predict if this looks like a safe location? 
(Screenshot of Street view for Manizales, Colombia on Google Earth)

Can a computer program predict things by looking at images? If so, then these kinds of computer programs could be used to automatically generate semantic map layovers (see the crime prediction overlay from the first figure), help organize fast-growing cities (computer vision meets urban planning?), and ultimately bring about a new generation of match-making "visual recommendation engines" (a whole suite of new startups).

Before I discuss the research paper behind this idea, here are two cool things you could do (in theory) with a non-visual data prediction algorithm. There are plenty of great product ideas in this space -- just be creative.

Startup Idea #1: Avoiding sketchy areas when traveling abroad 
A Personalized location recommendation engine could be used to find locations in a city that I might find interesting (techie coffee shop for entrepreneurs, a park good for frisbee) subject to my constraints (near my current location, in a low-danger area, low traffic).  Below is the kind of place you want to avoid if you're looking for a coffee and a place to open up your laptop to do some work.

Google Street Maps, Morumbi São Paulo: slum housing (image from geographyfieldwork.com)

Startup Idea #2: Apartment Pricing and Marketing from Images
Visual recommendation engines could be used to predict the best images to represent an apartment for an Airbnb listing.  It would be great if Airbnb had a filter that would let you upload videos of your apartment, and it would predict that set of static images that best depict your apartment to maximize earning potential. I'm sure that Airbnb users would pay extra for this feature if it was available for a small extra charge. The same computer vision prediction idea can be applied to home pricing on Zillow, Craigslist, and anywhere else that pictures of for-sale items are shared.

Google image search result for "Good looking apartment". Can computer vision be used to automatically select pictures that will make your apartment listing successful on Airbnb?

Part I. City Forensics: Using Visual Elements to Predict Non-Visual City Attributes


The Berkeley Computer Graphics Group has been working on predicting non-visual attributes from images, so before I describe their approach, let me discuss how Berkeley's Visual Elements relate to Deep Learning.

Predicting Chicago Thefts from San Francisco data. Predicting Philadelphia Housing Prices from Boston data. From City Forensics paper.



Deep Learning vs Mid-level Patch Discovery (Technical Discussion)
You might think that non-visual data prediction from images (if even possible) will require a deep understanding of the image and thus these approaches must be based on a recent ConvNet deep learning method. Obviously, knowing the locations and categories associated with each object in a scene could benefit any computer vision algorithm.  The problem is that such general purpose CNN recognition systems aren't powerful enough to parse Google Street View images, at least not yet.

Another extreme is to train classifiers on entire images.  This was initially done when researchers were using GIST, but there are just too many nuisance pixels inside a typical image, so it is better to focus your machine learning a subset of the image.  But how do you choose the subset of the image to focus on?

There exist computer vision algorithms that can mine a large dataset of images and automatically extract meaningful, repeatable, and detectable mid-level visual patterns. These methods are not label-based and work really well when there is an underlying theme tying together a collection of images. The set of all Google Street View Images from Paris satisfies this criterion.  Large collections of random images from the internet must be labeled before they can be used to produce the kind of stellar results we all expect out of deep learning. The Berkeley Group uses visual elements automatically mined from images as the core representation.  Mid-level visual patterns are simply chunks of the image which correspond to repeatable configurations -- they sometimes contain entire objects, parts of objects, and popular multiple object configurations. (See Figure below)  The mid-level visual patterns form a visual dictionary which can be used to represent the set of images. Different sets of images (e.g., images from two different US cities) will have different mid-level dictionaries. These dictionaries are similar to "Visual Words" but their creation uses more SVM-like machinery.

The patch mining algorithm is known as mid-level patch discovery. You can think of mid-level patch discovery as a visually intelligent K-means clustering algorithm, but for really really large datasets. Here's a figure from the ECCV 2012 paper which introduced mid-level discriminative patches.

Unsupervised Discovery of Mid-Level Discriminative Patches

Unsupervised Discovery of Mid-Level Discriminative Patches. Saurabh Singh, Abhinav Gupta and Alexei A. Efros. In European Conference on Computer Vision (2012).

I should also point out that non-final layers in a pre-trained CNN could also be used for representing images, without the need to use a descriptor such as HOG. I would expect the performance to improve, so the questions is perhaps: How long until somebody publishes an awesome unsupervised CNN-based patch discovery algorithm? I'm a handful of researchers are already working on it. :-)

Related Blog Post: From feature descriptors to deep learning: 20 years of computer vision
Related Blog Post: Deep Learning vs Machine Learning vs Pattern Recognition

The City Forensics paper from Berkeley tries to map the visual appearance of cities (as obtained from Google Street View Images) to non-visual data like crime statistics, housing prices and population density.  The basic idea is to 1.) mine discriminative patches from images and 2.) train a predictor which can map these visual primitives to non-visual data. While the underlying technique is that of mid-level patch discovery combined with Support Vector Regression (SVR), the result is an attribute-specific distribution over GPS coordinates.  Such a distribution should be appreciated for its own aesthetic value. I personally love custom data overlays.

City Forensics: Using Visual Elements to Predict Non-Visual City AttributesSean Arietta, Alexei A. Efros, Ravi Ramamoorthi, Maneesh Agrawala. In IEEE Transactions on Visualization and Computer Graphics (TVCG), 2014.


Part II. The Selfie 2.0: Computer Vision as a Sidekick


Sometimes you just want the algorithm to be your sidekick. Let's talk about a new and improved method for using vision algorithms and the wisdom of the crowds to select better pictures of your face. While you might think of an improved selfie as a silly application, you do want to look "professional" in your professional photos, sexy in your "selfies" and "friendly" in your family pictures. An algorithm that helps you get the desired picture is an algorithm the whole world can get behind.



Attractiveness versus Time. From MirrorMirror Paper.

The basic idea is to collect a large video of a single person which spans different emotions, times of day, different days, or whatever condition you would like to vary.  Given this video, you can use crowdsourcing to label frames based on a property like attractiveness or seriousness.  Given these labeled frames, you can then train a standard HOG detector and predict one of these attributes on new data. Below if a figure which shows the 10 best shots of the child (lots of smiling and eye contact) and the worst 10 shots (bad lighting, blur, red-eye, no eye contact).


10 good shots, 10 worst shots. From MirrorMirror Paper.

You can also collect a video of yourself as you go through a sequence of different emotions, get people to label frames, and build a system which can predict an attribute such as "seriousness".

Faces ranked from Most serious to least serious. From MirrorMirror Paper.


In this work, labeling was necessary for taking better selfies.  But if half of the world is taking pictures, while the other half is voting pictures up and down (or Tinder-style swiping left and right), then I think the data collection and data labeling effort won't be a big issue in years to come. Nevertheless, this is a cool way of scoring your photos. Regarding consumer applications, this is something that Google, Snapchat, and Facebook will probably integrate into their products very soon.

Mirror Mirror: Crowdsourcing Better Portraits. Jun-Yan Zhu, Aseem Agarwala, Alexei A. Efros, Eli Shechtman and Jue Wang. In ACM Transactions on Graphics (SIGGRAPH Asia), 2014.

Part III. What does it all mean? I'm ready for the cat pictures.


This final section revisits an old, simple, and powerful trick in computer vision and graphics. If you know how to compute the average of a sequence of numbers, then you'll have no problem understanding what an average image (or "mean image") is all about. And if you're read this far, don't worry, the cat picture is coming soon.

Computing average images (or "mean" images) is one of those tricks that I was introduced to very soon after I started working at CMU.  Antonio Torralba, who has always had "a few more visualization tricks" up his sleeve, started computing average images (in the early 2000s) to analyze scenes as well as datasets collected as part of the LabelMe project at MIT. There's really nothing more to the basic idea beyond simply averaging a bunch of pictures.

Teaser Image from AverageExplorer paper.

Usually this kind of averaging is done informally in research, to make some throwaway graphic, or make cool web-ready renderings.  It's great seeing an entire paper dedicated to a system which explores the concept of averaging even further. It took about 15 years of use until somebody was bold enough to write a paper about it. When you perform a little bit of alignment, the mean pictures look really awesome. Check out these cats!



Aligned cat images from the AverageExplorer paper. 
I want one! (Both the algorithm and a Platonic cat)

The AverageExplorer paper extends simple image average with some new tricks which make the operations much more effective. I won't say much about the paper (the link is below), just take at a peek at some of the coolest mean cats I've ever seen (visualized above) or a jaw-dropping way to look at community collected landmark photos (Oxford bridge mean image visualized below).

Aligned bridges from AverageExplorer paper. 
I wish Google would make all of Street View look like this.

AverageExplorer: Interactive Exploration and Alignment of Visual Data Collections. Jun-Yan Zhu, Yong Jae Lee, and Alexei A. Efros. In SIGGRAPH 2014.

Averaging images is a really powerful idea.  Want to know what your magical classifier is tuned to detect?  Compute the top detections and average them.  Soon enough you'll have a good idea of what's going on behind the scenes.

Conclusion


Allow me to mention the mastermind that helped bring most of these vision+graphics+learning applications to life.  There's an inimitable charm present in all of the works of Prof. Alyosha Efros -- a certain aesthetic that is missing from 2015's overly empirical zeitgeist.  He used to be at CMU, but recently moved back to Berkeley.

Being able to summarize several of years worth of research into a single computer generated graphic can go a long way to making your work memorable and inspirational. And maybe our lives don't need that much automation.  Maybe general purpose object recognition is too much? Maybe all we need is a little art? I want to leave you with a YouTube video from a recent 2015 lecture by Professor A.A. Efros titled "Making Visual Data a First-Class Citizen." If you want to hear the story in the master's own words, grab a drink and enjoy the lecture.

"Visual data is the biggest Big Data there is (Cisco projects that it will soon account for over 90% of internet traffic), but currently, the main way we can access it is via associated keywords. I will talk about some efforts towards indexing, retrieving, and mining visual data directly, without the use of keywords." ― A.A. Efros, Making Visual Data a First-Class Citizen




          Deep Learning vs Probabilistic Graphical Models vs Logic        
Today, let's take a look at three paradigms that have shaped the field of Artificial Intelligence in the last 50 years: Logic, Probabilistic Methods, and Deep Learning. The empirical, "data-driven", or big-data / deep-learning ideology triumphs today, but that wasn't always the case. Some of the earliest approaches to AI were based on Logic, and the transition from logic to data-driven methods has been heavily influenced by probabilistic thinking, something we will be investigating in this blog post.

Let's take a look back Logic and Probabilistic Graphical Models and make some predictions on where the field of AI and Machine Learning is likely to go in the near future. We will proceed in chronological order.

Image from Coursera's Probabilistic Graphical Models course

1. Logic and Algorithms (Common-sense "Thinking" Machines)


A lot of early work on Artificial Intelligence was concerned with Logic, Automated Theorem Proving, and manipulating symbols. It should not be a surprise that John McCarthy's seminal 1959 paper on AI had the title "Programs with common sense."

If we peek inside one of most popular AI textbooks, namely "Artificial Intelligence: A Modern Approach," we immediately notice that the beginning of the book is devoted to search, constraint satisfaction problems, first order logic, and planning. The third edition's cover (pictured below) looks like a big chess board (because being good at chess used to be a sign of human intelligence), features a picture of Alan Turing (the father of computing theory) as well as a picture of Aristotle (one of the greatest classical philosophers which had quite a lot to say about intelligence).

The cover of AIMA, the canonical AI text for undergraduate CS students

Unfortunately, logic-based AI brushes the perception problem under the rug, and I've argued quite some time ago that understanding how perception works is really the key to unlocking the secrets of intelligence. Perception is one of those things which is easy for humans and immensely difficult for machines. (To read more see my 2011 blog post, Computer Vision is Artificial Intelligence). Logic is pure and traditional chess-playing bots are very algorithmic and search-y, but the real world is ugly, dirty, and ridden with uncertainty.

I think most contemporary AI researchers agree that Logic-based AI is dead. The kind of world where everything can be perfectly observed, a world with no measurement error, is not the world of robotics and big-data.  We live in the era of machine learning, and numerical techniques triumph over first-order logic.  As of 2015, I pity the fool who prefers Modus Ponens over Gradient Descent.

Logic is great for the classroom and I suspect that once enough perception problems become "essentially solved" that we will see a resurgence in Logic.  And while there will be plenty of open perception problems in the future, there will be scenarios where the community can stop worrying about perception and start revisiting these classical ideas. Perhaps in 2020.

Further reading: Logic and Artificial Intelligence from the Stanford Encyclopedia of Philosophy

2. Probability, Statistics, and Graphical Models ("Measuring" Machines)


Probabilistic methods in Artificial Intelligence came out of the need to deal with uncertainty. The middle part of the Artificial Intelligence a Modern Approach textbook is called "Uncertain Knowledge and Reasoning" and is a great introduction to these methods.  If you're picking up AIMA for the first time, I recommend you start with this section. And if you're a student starting out with AI, do yourself a favor and don't skimp on the math.

Intro to PDFs from Penn State's Probability Theory and Mathematical Statistics course

When most people think about probabilistic methods they think of counting.  In laymen's terms it's fair to think of probabilistic methods as fancy counting methods.  Let's briefly take a look at what used to be the two competing methods for thinking probabilistically.

Frequentist methods are very empirical -- these methods are data-driven and make inferences purely from data.  Bayesian methods are more sophisticated and combine data-driven likelihoods with magical priors.  These priors often come from first principles or "intuitions" and the Bayesian approach is great for combining heuristics with data to make cleverer algorithms -- a nice mix of the rationalist and empiricist world views.

What is perhaps more exciting that then Frequentist vs. Bayesian flamewar, is something known as Probabilistic Graphical Models.  This class of techniques comes from computer science, and even though Machine Learning is now a strong component of a CS and a Statistics degree, the true power of statistics only comes when it is married with computation.

Probabilistic Graphical Models are a marriage of Graph Theory with Probabilistic Methods and they were all the rage among Machine Learning researchers in the mid 2000s. Variational methods, Gibbs Sampling, and Belief Propagation were being pounded into the brains of CMU graduate students when I was in graduate school (2005-2011) and provided us with a superb mental framework for thinking about machine learning problems. I learned most of what I know about Graphical Models from Carlos Guestrin and Jonathan Huang. Carlos Guestrin is now the CEO of GraphLab, Inc (now known as Dato) which is a company that builds large scale products for machine learning on graphs and Jonathan Huang is a senior research scientist at Google.

The video below is a high level overview of GraphLab, but it serves a very nice overview of "graphical thinking" and how it fits into the modern data scientist's tool-belt. Carlos is an excellent lecturer and his presentation is less about the company's product and more about ways for thinking about next generation machine learning systems.

A Computational Introduction to Probabilistic Graphical Models
by GraphLab, Inc CEO Prof. Carlos Guestrin

If you think that deep learning is going to solve all of your machine learning problems, you should really take a look at the above video.  If you're building recommender systems, an analytics platform for healthcare data, designing a new trading algorithm, or building the next generation search engine, Graphical Models are perfect place to start.

Further reading:
Belief Propagation Algorithm Wikipedia Page
An Introduction to Variational Methods for Graphical Models by Michael Jordan et al.
Michael Jordan's webpage (one of the titans of inference and graphical models)

3. Deep Learning and Machine Learning (Data-Driven Machines)

Machine Learning is about learning from examples and today's state-of-the-art recognition techniques require a lot of training data, a deep neural network, and patience. Deep Learning emphasizes the network architecture of today's most successful machine learning approaches.  These methods are based on "deep" multi-layer neural networks with many hidden layers. NOTE: I'd like to emphasize that using deep architectures (as of 2015) is not new.  Just check out the following "deep" architecture from 1998.

LeNet-5 Figure From Yann LeCun's seminal "Gradient-based learning
applied to document recognition" paper.

When you take a look at modern guide about LeNet, it comes with the following disclaimer:

"To run this example on a GPU, you need a good GPU. It needs at least 1GB of GPU RAM. More may be required if your monitor is connected to the GPU.

When the GPU is connected to the monitor, there is a limit of a few seconds for each GPU function call. This is needed as current GPUs can’t be used for the monitor while doing computation. Without this limit, the screen would freeze for too long and make it look as if the computer froze. This example hits this limit with medium-quality GPUs. When the GPU isn’t connected to a monitor, there is no time limit. You can lower the batch size to fix the time out problem."

It really makes me wonder how Yann was able to get anything out of his deep model back in 1998. Perhaps it's not surprising that it took another decade for the rest of us to get the memo.

UPDATE: Yann pointed out (via a Facebook comment) that the ConvNet work dates back to 1989. "It had about 400K connections and took about 3 weeks to train on the USPS dataset (8000 training examples) on a SUN4 machine." -- LeCun



NOTE: At roughly the same time (~1998) two crazy guys in California were trying to cache the entire internet inside the computers in their garage (they started some funny-sounding company which starts with a G). I don't know how they did it, but I guess sometimes to win big you have to do things that don't scale. Eventually the world will catch up.

Further reading:
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognitionProceedings of the IEEE, November 1998.

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989

Deep Learning code: Modern LeNet implementation in Theano and docs.


Conclusion

I don't see traditional first-order logic making a comeback anytime soon. And while there is a lot of hype behind deep learning, distributed systems and "graphical thinking" is likely to make a much more profound impact on data science than heavily optimized CNNs. There is no reason why deep learning can't be combined with a GraphLab-style architecture, and some of the new exciting machine learning work in the next decade is likely to be a marriage of these two philosophies.


You can also check out a relevant post from last month:
Deep Learning vs Machine Learning vs Pattern Recognition

Discuss on Hacker News
          Three Fundamental Dimensions for Thinking About Machine Learning Systems        
Today, let's set cutting-edge machine learning and computer vision techniques aside. You probably already know that computer vision (or "machine vision") is the branch of computer science / artificial intelligence concerned with recognizing objects like cars, faces, and hand gestures in images. And you also probably know that Machine Learning algorithms are used to drive state-of-the-art computer vision systems. But what's missing is a birds-eye view of how to think about designing new learning-based systems. So instead of focusing on today's trendiest machine learning techniques, let's go all the way back to day 1 and build ourselves a strong foundation for thinking about machine learning and computer vision systems.




Allow me to introduce three fundamental dimensions which you can follow to obtain computer vision masterdom. The first dimension is mathematical, the second is verbal, and the third is intuitive.

On a personal level, most of my daily computer vision activities directly map onto these dimensions. When I'm at a coffee shop, I prefer the mathematical - pen and paper are my weapons of choice. When it's time to get ideas out of my head, there's nothing like a solid founder-founder face-to-face meeting, an occasional MIT visit to brainstorm with my scientist colleagues, or simply rubberducking (rubber duck debugging) with developers. And when it comes to engineering, interacting with a live learning system can help develop the intuition necessary to make a system more powerful, more efficient, and ultimately much more robust.

Mathematical: Learn to love the linear classifier

At the core of machine learning is mathematics, so you shouldn't be surprised that I include mathematical as one of the three fundamental dimensions of thinking about computer vision.

The single most important concept in all of machine learning which you should master is the idea of the classifier. For some of you, classification is a well-understood problem; however, too many students prematurely jump into more complex algorithms line randomized decision forests and multi-layer neural networks, without first grokking the power of the linear classifier. Plenty of data scientists will agree that the linear classifier is the most fundamental machine learning algorithm. In fact, when Peter Norvig, Director of Research at Google, was asked "Which AI field has surpassed your expectations and surprised you the most?" in his 2010 interview, he answered with "machine learning by linear separators." 

The illustration below depicts a linear classifier. In two dimensions, a linear classifier is a line which separates the positive examples from the negative examples.  You should first master the 2D linear classifier, even though in most applications you'll need to explore a higher-dimensional feature space. My personal favorite learning algorithm is the linear support vector machine, or linear SVM. In a SVM, overly-confident data points do not influence the decision boundary. Or put in another way, learning with these confident points is like they aren't even there! This is a very useful property for large-scale learning problems where you can't fit all data into memory. You're going to want to master the linear SVM (and how it relates to Linear Discriminant Analysis, Linear Regression, and Logistic Regression) if you're going to pass one of my whiteboard data-science interviews.


Linear Support Vector Machine from the SVM Wikipedia page


An intimate understanding of the linear classifier is necessary to understand how deep learning systems work.  The neurons inside a multi-layer neural network are little linear classifiers, and while the final decision boundary is non-linear, you should understand the underlying primitives very well. Loosely speaking, you can think of the linear classifier as a simple spring system and a more complex classifiers as a higher-order assembly of springs.


Also, there are going to be scenarios in your life as a data-scientist where a linear classifier should be the first machine learning algorithm you try. So don't be afraid to use some pen and paper, get into that hinge loss, and master the fundamentals.

Further reading: Google's Research Director talks about Machine Learning. Peter Norvig's Reddit AMA on YouTube from 2010.
Further reading: A demo for playing with linear classifiers in the browser. Linear classifier Javascript demo from Stanford's CS231n: Convolutional Neural Networks for Visual Recognition.
Further reading: My blog post: Deep Learning vs Machine Learning vs Pattern Recognition

Verbal: Talk about you vision (and join a community)

As you start acquiring knowledge of machine learning concepts, the best way forward is to speak up. Learn something, then teach a friend. As counterintuitive as it sounds, when it comes down to machine learning mastery, human-human interaction is key. This is why getting a ML-heavy Masters or PhD degree is ultimately the best bet for those adamant about becoming pioneers in the field. Daily conversations are necessary to strengthen your ideas.  See Raphael's "The School of Athens" for a depiction of what I think of as the ideal learning environment.  I'm sure half of those guys were thinking about computer vision.


An ideal ecosystem for collaboration and learning about computer vision


If you're not ready for a full-time graduate-level commitment to the field, consider a.) taking an advanced undergraduate course in vision/learning from your university, b.) a machine learning MOOC, or c.) taking part in a practical and application-focused online community/course focusing on computer vision.

During my 12-year academic stint, I made the observation that talking to your peers about computer vision and machine learning is more important that listening to teachers/supervisors/mentors.  Of course, there's much value in having a great teacher, but don't be surprised if you get 100x more face-to-face time with your friends compared to student-teacher interactions.  So if you take an online course like Coursera's Machine Learning MOOC, make sure to take it with friends.  Pause the video and discuss. Go to dinner and discuss. Write some code and discuss. Rinse, lather, repeat.

Coursera's Machine Learning MOOC taught by Andrew Ng


Another great opportunity is to follow Adrian Rosebrock's pyimagesearch.com blog, where he focuses on python and computer vision applications.  

Further reading: Old blog post: Why your vision lab needs a reading group

Homework assignment: First somebody on the street and teach them about machine learning.

Intuitive: Play with a real-time machine learning system

The third and final dimension is centered around intuition. Intuition is the ability to understand something immediately, without the need for conscious reasoning. The following guidelines are directed towards real-time object detection systems, but can also transfer over to other applications like learning-based attribution models for advertisements, high-frequency trading, as well as numerous tasks in robotics.

To gain some true insights about object detection, you should experience a real-time object detection system.  There's something unique about seeing a machine learning system run in real-time, right in front of you.  And when you get to control the input to the system, such as when using a webcam, you can learn a lot about how the algorithms work.  For example, seeing the classification score go down as you occlude the object of interest, and seeing the detection box go away when the object goes out of view is fundamental to building intuition about what works and what elements of a system need to improve.

I see countless students tweaking an algorithm, applying it to a static large-scale dataset, and then waiting for the precision-recall curve to be generated. I understand that this is the hard and scientific way of doing things, but unless you've already spent a few years making friends with every pixel, you're unlikely to make a lasting contribution this way. And it's not very exciting -- you'll probably fall asleep at your desk.

Using a real-time feedback loop (see illustration below), you can learn about the patterns which are intrinsically difficult to classify, as well what environmental variations (lights, clutter, motion) affect your system the most.  This is something which really cannot be done with a static dataset.  So go ahead, mine some intuition and play.
Visual Debugging: Designing the vision.ai real-time gesture-based controller in Fall 2013

Visual feedback is where our work at vision.ai truly stands out. Take a look at the following video, where we show a live example of training and playing with a detector based on vision.ai's VMX object recognition system.


NOTE: There a handful of other image recognition systems out there which you can turn into real-time vision systems, but be warned that optimization for real-time applications requires some non-trivial software engineering experience.  We've put a lot of care into our system so that the detection scores are analogous to a linear SVM scoring strategy. Making the output of a non-trivial learning algorithm backwards-compatible with a linear SVM isn't always easy, but in my opinion, well-worth the effort.

Extra Credit: See comments below for some free VMX by vision.ai beta software licenses so you can train some detectors using our visual feedback interface and gain your own machine vision intuition.

Conclusion

The three dimensions, namely mathematical, verbal, and intuitive provide different ways for advancing your knowledge of machine learning and computer vision systems.  So remember to love the linear classifier, talk to your friends, and use a real-time feedback loop when designing your machine learning system.





          Deep Learning vs Machine Learning vs Pattern Recognition        
Lets take a close look at three related terms (Deep Learning vs Machine Learning vs Pattern Recognition), and see how they relate to some of the hottest tech-themes in 2015 (namely Robotics and Artificial Intelligence). In our short journey through jargon, you should acquire a better understanding of how computer vision fits in, as well as gain an intuitive feel for how the machine learning zeitgeist has slowly evolved over time.

Fig 1. Putting a human inside a computer is not Artificial Intelligence
(Photo from WorkFusion Blog)

If you look around, you'll see no shortage of jobs at high-tech startups looking for machine learning experts. While only a fraction of them are looking for Deep Learning experts, I bet most of these startups can benefit from even the most elementary kind of data scientist. So how do you spot a future data-scientist? You learn how they think. 

The three highly-related "learning" buzz words

“Pattern recognition,” “machine learning,” and “deep learning” represent three different schools of thought.  Pattern recognition is the oldest (and as a term is quite outdated). Machine Learning is the most fundamental (one of the hottest areas for startups and research labs as of today, early 2015). And Deep Learning is the new, the big, the bleeding-edge -- we’re not even close to thinking about the post-deep-learning era.  Just take a look at the following Google Trends graph.  You'll see that a) Machine Learning is rising like a true champion, b) Pattern Recognition started as synonymous with Machine Learning, c) Pattern Recognition is dying, and d) Deep Learning is new and rising fast.



1. Pattern Recognition: The birth of smart programs

Pattern recognition was a term popular in the 70s and 80s. The emphasis was on getting a computer program to do something “smart” like recognize the character "3". And it really took a lot of cleverness and intuition to build such a program. Just think of "3" vs "B" and "3" vs "8".  Back in the day, it didn’t really matter how you did it as long as there was no human-in-a-box pretending to be a machine. (See Figure 1)  So if your algorithm would apply some filters to an image, localize some edges, and apply morphological operators, it was definitely of interest to the pattern recognition community.  Optical Character Recognition grew out of this community and it is fair to call “Pattern Recognition” as the “Smart" Signal Processing of the 70s, 80s, and early 90s. Decision trees, heuristics, quadratic discriminant analysis, etc all came out of this era. Pattern Recognition become something CS folks did, and not EE folks.  One of the most popular books from that time period is the infamous invaluable Duda & Hart "Pattern Classification" book and is still a great starting point for young researchers.  But don't get too caught up in the vocabulary, it's a bit dated.



The character "3" partitioned into 16 sub-matrices. Custom rules, custom decisions, and custom "smart" programs used to be all the rage. 


QuizThe most popular Computer Vision conference is called CVPR and the PR stands for Pattern Recognition.  Can you guess the year of the first CVPR conference?

2. Machine Learning: Smart programs can learn from examples

Sometime in the early 90s people started realizing that a more powerful way to build pattern recognition algorithms is to replace an expert (who probably knows way too much about pixels) with data (which can be mined from cheap laborers).  So you collect a bunch of face images and non-face images, choose an algorithm, and wait for the computations to finish.  This is the spirit of machine learning.  "Machine Learning" emphasizes that the computer program (or machine) must do some work after it is given data.  The Learning step is made explicit.  And believe me, waiting 1 day for your computations to finish scales better than inviting your academic colleagues to your home institution to design some classification rules by hand.


"What is Machine Learning" from Dr Natalia Konstantinova's Blog. The most important part of this diagram are the "Gears" which suggests that crunching/working/computing is an important step in the ML pipeline.

As Machine Learning grew into a major research topic in the mid 2000s, computer scientists began applying these ideas to a wide array of problems.  No longer was it only character recognition, cat vs. dog recognition, and other “recognize a pattern inside an array of pixels” problems.  Researchers started applying Machine Learning to Robotics (reinforcement learning, manipulation, motion planning, grasping), to genome data, as well as to predict financial markets.  Machine Learning was married with Graph Theory under the brand “Graphical Models,” every robotics expert had no choice but to become a Machine Learning Expert, and Machine Learning quickly became one of the most desired and versatile computing skills.  However "Machine Learning" says nothing about the underlying algorithm.  We've seen convex optimization, Kernel-based methods, Support Vector Machines, as well as Boosting have their winning days.  Together with some custom manually engineered features, we had lots of recipes, lots of different schools of thought, and it wasn't entirely clear how a newcomer should select features and algorithms.  But that was all about to change...

Further reading: To learn more about the kinds of features that were used in Computer Vision research see my blog post: From feature descriptors to deep learning: 20 years of computer vision.

3. Deep Learning: one architecture to rule them all

Fast forward to today and what we’re seeing is a large interest in something called Deep Learning. The most popular kinds of Deep Learning models, as they are using in large scale image recognition tasks, are known as Convolutional Neural Nets, or simply ConvNets. 


ConvNet diagram from Torch Tutorial

Deep Learning emphasizes the kind of model you might want to use (e.g., a deep convolutional multi-layer neural network) and that you can use data fill in the missing parameters.  But with deep-learning comes great responsibility.  Because you are starting with a model of the world which has a high dimensionality, you really need a lot of data (big data) and a lot of crunching power (GPUs). Convolutions are used extensively in deep learning (especially computer vision applications), and the architectures are far from shallow.

If you're starting out with Deep Learning, simply brush up on some elementary Linear Algebra and start coding.  I highly recommend Andrej Karpathy's Hacker's guide to Neural Networks. Implementing your own CPU-based backpropagation algorithm on a non-convolution based problem is a good place to start.

There are still lots of unknowns. The theory of why deep learning works is incomplete, and no single guide or book is better than true machine learning experience.  There are lots of reasons why Deep Learning is gaining popularity, but Deep Learning is not going to take over the world.  As long as you continue brushing up on your machine learning skills, your job is safe. But don't be afraid to chop these networks in half, slice 'n dice at will, and build software architectures that work in tandem with your learning algorithm.  The Linux Kernel of tomorrow might run on Caffe (one of the most popular deep learning frameworks), but great products will always need great vision, domain expertise, market development, and most importantly: human creativity.

Other related buzz-words

Big-data is the philosophy of measuring all sorts of things, saving that data, and looking through it for information.  For business, this big-data approach can give you actionable insights.  In the context of learning algorithms, we’ve only started seeing the marriage of big-data and machine learning within the past few years.  Cloud-computing, GPUs, DevOps, and PaaS providers have made large scale computing within reach of the researcher and ambitious "everyday" developer. 

Artificial Intelligence is perhaps the oldest term, the most vague, and the one that was gone through the most ups and downs in the past 50 years. When somebody says they work on Artificial Intelligence, you are either going to want to laugh at them or take out a piece of paper and write down everything they say.

Further reading: My 2011 Blog post Computer Vision is Artificial Intelligence.

Conclusion

Machine Learning is here to stay. Don't think about it as Pattern Recognition vs Machine Learning vs Deep Learning, just realize that each term emphasizes something a little bit different.  But the search continues.  Go ahead and explore. Break something. We will continue building smarter software and our algorithms will continue to learn, but we've only begun to explore the kinds of architectures that can truly rule-them-all.

If you're interested in real-time vision applications of deep learning, namely those suitable for robotic and home automation applications, then you should check out what we've been building at vision.ai. Hopefully in a few days, I'll be able to say a little bit more. :-)

Until next time.





          Coursera on Deep Learning with Andrew Ng        
Build Your Career in AI

Take our new Deep Learning  courses, now open on Coursera  

Enroll.

By Andrew Ng

More on the courses.


          Free Deep Learning Book Completed         
Data Science Central points to a free book on deep learning by the MIT press.   Ultimately very technical, but the introductions are useful for anyone interested in the topic.   Table of contents and links to all sections at the link below.

Free Deep Learning Book (MIT Press)  Posted by Vincent Granville  
The Deep Learning textbook is a resource intended to help students and practitioners enter the field of machine learning in general and deep learning in particular. The online version of the book is now complete and will remain available online for free. .... " 

This is the same book that was mentioned previously, it is now completed.


          Episode 82: Attack of the two-pizza teams        
...Eventually, someone has to clean up the leftover pizza. ...That sweet OpEx. ..."Easy to stay." Amazon came out with a slew of features last week. This week we discuss them and take some cracks at the broad, portfolio approach at AWS compared to historic (like .Net) platform approaches. We also discuss footwear and what to eat and where to stay in Las Vegas. Footware Kenneth Cole slip on shoes (http://amzn.to/2gH6OzD). Keen Austin shoes, slip-on (http://amzn.to/2h2gveX) and lace (http://amzn.to/2ggll4y). The Doc Martin's Coté used to wear, Hickmire (http://amzn.to/2hlPnIJ). Mid-roll Coté: the Cloud Native roadshows are over, but check out the cloud native WIP I have at cote.io/cloud2 (http://cote.io/cloud2) or, just check out some excerpts on working with auditors (https://medium.com/@cote/auditors-your-new-bffs-918c8671897a#.et5tv7p7l), selecting initial projects (https://medium.com/@cote/getting-started-picking-your-first-cloud-native-projects-or-every-digital-transformation-starts-d0b1295f3712#.v7jpyjvro), and dealing with legacy (https://medium.com/built-to-adapt/deal-with-legacy-before-it-deals-with-you-cc907c800845#.ixtz1kqdz). Matt: Presenting at the CC Dojo #3, talking DevOps in Tokyo (https://connpass.com/event/46308/) AWS re:Invent Matt Ray heroically summarizes all here. Richard has a write-up as well (https://www.infoq.com/news/2016/12/aws-reinvent-recap). RedMonk re:Cap (http://redmonk.com/sogrady/2016/12/07/the-redmonk-reinvent-recap/) Global Partner Summit Don't hedge your bets, "AWS has no time for uncommitted partners" (http://www.zdnet.com/article/andy-jassy-warns-aws-has-no-time-for-uncommitted-partners/) "10,000 new Partners have joined the APN in the past 12 months" (https://aws.amazon.com/blogs/aws/aws-global-partner-summit-report-from-reinvent-2016/) Day 1 - "I'd like to tell you about…" Amazon Lightsail (https://aws.amazon.com/blogs/aws/amazon-lightsail-the-power-of-aws-the-simplicity-of-a-vps/) Monthly instances with memory, cpu, storage & static IP Bitnami! Hello Digital Ocean & Linode Amazon Athena (https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/) S3 SQL queries, based on Presto distributed SQL engine JSON, CSV, log files, delimited text, others Coté: this seems pretty amazing. Amazon Rekognition (https://aws.amazon.com/blogs/aws/amazon-rekognition-image-detection-and-recognition-powered-by-deep-learning/) Image detection & recognition Amazon Polly (https://aws.amazon.com/blogs/aws/polly-text-to-speech-in-47-voices-and-24-languages/) Text to Speech in 47 Voices and 24 Languages Coté: Makes transcripts? Amazon Lex (https://aws.amazon.com/blogs/aws/amazon-lex-build-conversational-voice-text-interfaces/) Conversational voice & text interface builder (ie. chatbots) Coté: make chat-bots and such. AWS Greengrass (https://aws.amazon.com/blogs/aws/aws-greengrass-ubiquitous-real-world-computing/) Local Lambda processing for IoT Coté: is this supposed to be, like, for running Lambda things on disconnected devices? Like fPaaS in my car? AWS Snowball Edge & Snowmobile (https://aws.amazon.com/blogs/aws/aws-snowball-edge-more-storage-local-endpoints-lambda-functions/) Local processing of data? S3/NFS and local Lambda processing? I'm thinking easy hybrid on-ramp Not just me (https://twitter.com/CTOAdvisor/status/806320423881162753) More on it (http://www.techrepublic.com/article/how-amazon-is-moving-closer-to-on-premises-compute-with-snowball-edge/) Move exabytes in weeks (https://aws.amazon.com/blogs/aws/aws-snowmobile-move-exabytes-of-data-to-the-cloud-in-weeks/) "Snowmobile is a ruggedized, tamper-resistant shipping container 45 feet long, 9.6 feet high, and 8 feet wide. It is waterproof, climate-controlled, and can be parked in a covered or uncovered area adjacent to your existing data center." Coté: LEGOS! More instance types, Elastic GPUs, F1 Instances, PostgreSQL for Aurora High I/O (I3 3.3 million IOPs 16GB/s), compute (C5 72 vCPUs, 144 GiB), memory (R4 488 Gib), burstable (T2 shared) (https://aws.amazon.com/blogs/aws/ec2-instance-type-update-t2-r4-f1-elastic-gpus-i3-c5/) Mix EC2 instance type with a 1-8 GiB GPU (https://aws.amazon.com/blogs/aws/in-the-work-amazon-ec2-elastic-gpus/) More! (https://aws.amazon.com/blogs/aws/developer-preview-ec2-instances-f1-with-programmable-hardware/) F1: FPGA EC2 instances, also available for use in the AWS Marketplace (https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgresql-compatibility/) RDS vs. Aurora Postgres? Aurora is more fault tolerant apparently? Day 2 AWS OpsWorks for Chef Automate (https://aws.amazon.com/opsworks/chefautomate/) Chef blog (https://blog.chef.io/2016/12/01/chef-automate-now-available-fully-managed-service-aws/) Fully managed Chef Server & Automate Previous OpsWorks now called "OpsWorks Stacks" Cloud Opinion approves the Chef strategy (https://twitter.com/cloud_opinion/status/804374597449584640) EC2 Systems Manager Tools for managing EC2 & on-premises systems (https://aws.amazon.com/ec2/systems-manager/) AWS Codebuild Managed elastic build service with testing (https://aws.amazon.com/blogs/aws/aws-codebuild-fully-managed-build-service/) AWS X-Ray (https://aws.amazon.com/blogs/aws/aws-x-ray-see-inside-of-your-distributed-application/) Distributed debugging service for EC2/ECS/Lambda? "easy way for developers to "follow-the-thread" as execution traverses EC2 instances, ECS containers, microservices, AWS database and messaging services" AWS Personal Health Dashboard (https://aws.amazon.com/blogs/aws/new-aws-personal-health-dashboard-status-you-can-relate-to/) Personalized AWS monitoring & CloudWatch Events auto-remediation Disruptive to PAAS monitoring & APM (New Relic, DataDog, App Dynamics) AWS Shield (https://aws.amazon.com/blogs/aws/aws-shield-protect-your-applications-from-ddos-attacks/) DDoS protection Amazon Pinpoint Mobile notification & analytics service (https://aws.amazon.com/blogs/aws/amazon-pinpoint-hit-your-targets-with-aws/) AWS Glue Managed data catalog & ETL (extract, transform & load) service for data analysis AWS Batch Automated AWS provisioning for batch jobs (https://aws.amazon.com/blogs/aws/aws-batch-run-batch-computing-jobs-on-aws/) C# in Lamba, Lambda Edge, AWS Step Functions Werner Vogels: "serverless, there is no cattle, only the herd" Lambda Edge (https://aws.amazon.com/blogs/aws/coming-soon-lambda-at-the-edge/) for running in response to CloudFront events, ""intelligent" processing of HTTP requests at a location that is close" More (https://aws.amazon.com/blogs/aws/new-aws-step-functions-build-distributed-applications-using-visual-workflows/) Step Functions a visual workflow "state machine" for Lambda functions More (https://serverless.zone/faas-is-stateless-and-aws-step-functions-provides-state-as-a-service-2499d4a6e412) BLOX (https://aws.amazon.com/blogs/compute/introducing-blox-from-amazon-ec2-container-service/): EC2 Container Service Scheduler Open source scheduler, watches CloudWatch events for managing ECS deployments Blox.github.io Analysis discussion for all the AWS stuff Jesus! I couldn't read it all! So, what's the role of Lambda here? It seems like the universal process thingy - like AppleScript, bash scripts, etc. for each part: if you need/want to add some customization to each thing, put a Lambda on it. What's the argument against just going full Amazon, in the same way you'd go full .Net, etc.? Is it cost? Lockin? Performance (people always talk about Amazon being kind of flakey at times - but what isn't flakey, your in-house run IT? Come on.) BONUS LINKS! Not covered in episode. Docker for AWS "EC2 Container Service, Elastic Beanstalk, and Docker for AWS all cost nothing; the only costs are those incurred by using AWS resources like EC2 or EBS." (http://www.infoworld.com/article/3145696/application-development/docker-for-aws-whos-it-really-for.html) Docker gets paid on usage? Apparently an easier learning curve than ECS + AWS services, but whither Blox? Time to Break up Amazon? Someone has an opinion (http://www.geekwire.com/2016/new-study-compares-amazon-19th-century-robber-barons-urges-policymakers-break-online-retail-giant/) HPE Discover, all about the "Hybrid Cloud" Hybrid it up! (http://www.zdnet.com/article/hpe-updates-its-converged-infrastructure-hybrid-cloud-software-lineup/) Killed "The Machine" (http://www.theregister.co.uk/2016/11/29/hp_labs_delivered_machine_proof_of_concept_prototype_but_machine_product_is_no_more/) HPE's Synergy software, based on OpenStack (is this just Helion rebranded?) Not great timing for a conference Sold OpenStack & CloudFoundry bits to SUSE (http://thenewstack.io/suse-add-hpes-openstack-cloud-foundry-portfolio-boost-kubernetes-investment/), the new "preferred Linux partner": How Google is Challenging AWS Ben on public cloud (https://stratechery.com/2016/how-google-cloud-platform-is-challenging-aws/) "open-sourcing Kubernetes was Google's attempt to effectively build a browser on top of cloud infrastructure and thus decrease switching costs; the company's equivalent of Google Search will be machine learning." Exponent.fm episode 097 — Google vs AWS (http://exponent.fm/episode-097-google-versus-aws/) Recommendations Brandon: Apple Wifi Calling (https://support.apple.com/en-us/HT203032) & Airplane mode (https://support.apple.com/en-us/HT204234). Westworld worth watching (http://www.hbo.com/westworld). Matt: Backyard Kookaburras (https://www.youtube.com/watch?v=DmNn7P59HcQ). Magpies too! (http://www.musicalsoupeaters.com/swooping-season/) This gif (https://media.giphy.com/media/wik7sKOl86OFq/giphy.gif). Coté: W Hotel in Las Vegas (http://www.wlasvegas.com/) and lobster eggs benedict (https://www.instagram.com/p/BNxAyQbjKCQ/) at Payard's in Ceasers' Outro: "I need my minutes," Soul Position (http://genius.com/Soul-position-i-need-my-minutes-lyrics).
          CES 2016. Η Νvidia ανακοίνωσε το Drive PX 2 με Pascal GPUs        
Στην CES 2016 η Νvidia ανακοίνωσε τον διάδοχο του Drive PX, το Drive PX 2. Η Nvidia αποκαλεί το Drive PX 2 ως την πρώτη "in-car AI deep-learning" συσκευή, ένα κινητό supercomputer σε μέγεθος κουτιού γεύματος. Το Drive PX 2 είναι ένα ακόμα σημαντικό βήμα, με βάση την Nvidia, για την δημιουργία αυτόνομων οχημάτων στο μέλλον. Το Drive PX 2 προορίζεται για την αυτοκινητοβιομηχανία και ενσωματώνει ιδιαίτερα υψηλή υπολογιστική ισχύ, αρκετή ώστε να δώσει την δυνατότητα στις αυτοκινητοβιομηχανίες ...
          Self-Driving Deep Learning with Lex Fridman        
Self-driving cars are here. Fully autonomous systems like Waymo are being piloted in less complex circumstances. Human-in-the-loop systems like Tesla Autopilot navigate drivers when it is safe to do so, and lets the human take control in ambiguous circumstances. Computers are great at memorization, but not yet great at reasoning. We cannot enumerate to a computer every single circumstance that a car might find itself in. The computer needs to

Continue reading...


          Distributed Deep Learning with Will Constable        
Deep learning allows engineers to build models that can make decisions based on training data. These models improve over time using stochastic gradient descent. When a model gets big enough, the training must be broken up across multiple machines. Two strategies for doing this are “model parallelism” which divides the model across machines and “data parallelism” which divides the data across multiple copies of the model. Distributed deep learning brings

Continue reading...


          Deep Learning and the Artificial Intelligence Revolution: Part 4        

Welcome to the final installment of our 4-part blog series.

  • In part 1, we looked at the history of AI, and why it is taking off now
  • In part 2, we discussed the differences between AI, Machine Learning, and Deep Learning
  • In part 3, we dived deeper into deep learning and evaluated key considerations when selecting a database for new projects
  • We’ll wrap up in today’s part 4 with a discussion on why MongoDB is being used for deep learning, and provide examples of where it is being used

If you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.

Why MongoDB for Deep Learning?

If you haven’t read part 3, it’s worth visiting that post to learn more about the key considerations when selecting a database for new deep learning projects. As the following section demonstrates, developers and data scientists can harness MongoDB as a flexible, scalable, and performant distributed database to meet the rigors of AI application development.

Flexible Data Model

MongoDB's document data model makes it easy for developers and data scientists to store and combine data of any structure within the database, without giving up sophisticated validation rules to govern data quality. The schema can be dynamically modified without application or database downtime that results from costly schema modifications or redesign incurred by relational database systems.

This data model flexibility is especially valuable to deep learning, which involves constant experimentation to uncover new insights and predictions:

  • Input datasets can comprise rapidly changing structured and unstructured data ingested from clickstreams, log files, social media and IoT sensor streams, CSV, text, images, video, and more. Many of these datasets do not map well into the rigid row and column formats of relational databases.
  • The training process often involves adding new hidden layers, feature labels, hyperparameters, and input data, requiring frequent modifications to the underlying data model.

A database supporting a wide variety of input datasets, with the ability to seamlessly modify parameters for model training, is therefore essential.

Rich Programming and Query Model

MongoDB offers both native drivers and certified connectors for developers and data scientists building deep learning models with data from MongoDB. The PyMongo driver is the recommended way to work with MongoDB from Python, implementing an idiomatic API that makes development natural for Python programmers. The community developed MongoDB Client for R is also available for R programmers.

The MongoDB query language and rich secondary indexes enable developers to build applications that can query and analyze the data in multiple ways. Data can be accessed by single keys, ranges, text search, graph, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds.

To parallelize data processing across a distributed database cluster, MongoDB provides the aggregation pipeline and MapReduce. The MongoDB aggregation pipeline is modeled on the concept of data processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into an aggregated result using native operations executed within MongoDB. The most basic pipeline stages provide filters that operate like queries, and document transformations that modify the form of the output document. Other pipeline operations provide tools for grouping and sorting documents by specific fields as well as tools for aggregating the contents of arrays, including arrays of documents. In addition, pipeline stages can use operators for tasks such as calculating the average or standard deviations across collections of documents, and manipulating strings. MongoDB also provides native MapReduce operations within the database, using custom JavaScript functions to perform the map and reduce stages.

In addition to its native query framework, MongoDB also offers a high performance connector for Apache Spark. The connector exposes all of Spark’s libraries, including Python, R, Scala and Java. MongoDB data is materialized as DataFrames and Datasets for analysis with machine learning, graph, streaming, and SQL APIs.

Figure 1: Combining MongoDB with Spark with Sophisticated Analytics & Machine Learning

The MongoDB Connector for Apache Spark can take advantage of MongoDB’s aggregation pipeline and secondary indexes to extract, filter, and process only the range of data it needs – for example, analyzing all customers located in a specific geography. This is very different from simple NoSQL datastores that do not support either secondary indexes or in-database aggregations. In these cases, Spark would need to extract all data based on a simple primary key, even if only a subset of that data is required for the Spark process. This means more processing overhead, more hardware, and longer time-to-insight for data scientists and engineers. To maximize performance across large, distributed data sets, the MongoDB Connector for Apache Spark can co-locate Resilient Distributed Datasets (RDDs) with the source MongoDB node, thereby minimizing data movement across the cluster and reducing latency.

Performance, Scalability & Redundancy

Model training time can be reduced by building the deep learning platform on top of a performant and scalable database layer. MongoDB offers a number of innovations to maximize throughput and minimize latency of deep learning workloads:

  • WiredTiger is the default storage engine for MongoDB, developed by the architects of Berkeley DB, the most widely deployed embedded data management software in the world. WiredTiger scales on modern, multi-core architectures. Using a variety of programming techniques such as hazard pointers, lock-free algorithms, fast latching and message passing, WiredTiger maximizes computational work per CPU core and clock cycle. To minimize on-disk overhead and I/O, WiredTiger uses compact file formats and storage compression.
  • For the most latency-sensitive deep learning applications, MongoDB can be configured with the In-Memory storage engine. Based on WiredTiger, this storage engine gives users the benefits of in-memory computing, without trading away the rich query flexibility, real-time analytics, and scalable capacity offered by conventional disk-based databases.
  • To parallelize model training and scale input datasets beyond a single node, MongoDB uses a technique called sharding, which distributes processing and data across clusters of commodity hardware. MongoDB sharding is fully elastic, automatically rebalancing data across the cluster as the input dataset grows, or as nodes are added and removed.
  • Within a MongoDB cluster, data from each shard is automatically distributed to multiple replicas hosted on separate nodes. MongoDB replica sets provide redundancy to recover training data in the event of a failure, reducing the overhead of checkpointing.

Tunable Consistency

MongoDB is strongly consistent by default, enabling deep learning applications to immediately read what has been written to the database, thus avoiding the developer complexity imposed by eventually consistent systems. Strong consistency will provide the most accurate results for machine learning algorithms; however in some scenarios, such as SGD, it is acceptable to trade consistency against specific performance goals by distributing queries across a cluster of MongoDB secondary replica set members.

MongoDB AI Deployments

Due to the properties discussed above, MongoDB is serving as the database for many AI and deep learning platforms. A selection of users across different applications and industries follows:

IBM Watson: Analytics & Visualization

Watson Analytics is IBM’s cloud-hosted service providing smart data discovery to guide data exploration, automate predictive analytics and visualize outputs. Watson Analytics is used across banking, insurance, retail, telecommunications, petroleum, and government applications. MongoDB is used alongside DB2 for managing data storage. MongoDB provides a metadata repository of all source data assets and analytics visualizations, stored in rich JSON document structures, with the scalability to support tens of thousands of concurrent users accessing the service.

x.ai: Personal Assistant

x.ai is an AI-powered personal assistant that schedules meetings for its user. Users connect their calendars to x.ai, and then when it's time to set a meeting via email, users instead delegate the scheduling task to 'Amy Ingram' by ccing amy@x.ai. Once she's copied into the email thread, she finds a mutually agreeable time and place and sets up the meeting for you. MongoDB serves as the system of record for the entire x.ai platform, supporting all services including natural language processing, supervised learning, analytics and email communication. MongoDB's flexible data model has been critical in enabling x.ai to rapidly adapt its training and input data sets, while supporting complex data structures. Learn more by reading the case study.

Auto Trader: Predicting Value

The UK’s largest digital car marketplace makes extensive use of machine learning running against data stored in MongoDB. The car’s specifications and details, such as number of previous owners, condition, color, mileage, insurance history, upgrades, and more are stored in MongoDB. This data is extracted by machine learning algorithms written by Auto Trader’s data science team to generate accurate predictions of value, which are then written back to the database. MongoDB was selected due to its flexible data model and distributed design, allowing scalability across a cluster of more than 40 instances. Learn more from coverage in the technology press.

Mintigo: Predictive Sales & Marketing

Founded by former intelligence agency data scientists, Mintigo delivers a predictive marketing engine for companies such as Red Hat. Through sophisticated machine learning algorithms operating against large data sets stored in MongoDB, Mintigo helps marketing and sales organizations better identify leads most likely to convert to customers. Through its engine, Mintigo users average a 4x improvement in overall marketing funnel efficiency. Mintigo runs on AWS, with machine learning algorithms written in Python. MongoDB is used to store multi-TB data sets, and was selected for scalability of streaming data ingest and storage, and schema flexibility. MongoDB’s expressive query framework and secondary indexes feeds the algorithms with relevant data, without needing to scan every record in the database. Learn more from the case study

Geo-Location Analysis for Retail

A US-based mobile app developer has built its Intelligence Engine on MongoDB, processing and storing tens of millions of rich geospatial data points on customers and their locations in real time. The Intelligence Engine uses scalable machine learning and multi-dimensional analytic techniques to surface behavioral patterns that allows retailers to predict and target customers with location-based offers through their mobile devices. MongoDB’s support for geospatial data structures with sophisticated indexing and querying provides the foundation for the machine learning algorithms. MongoDB’s scale-out design with sharding allows the company to scale from 10s to 100s of millions of customer data points.

Natural Language Processing (NLP)

A North American AI developer has built NLP software that is embedded by major consumer electronics brands into smart home and mobile devices. All interactions between the device and user are stored in MongoDB, which are then fed back into the learning algorithms. MongoDB was selected for its schema flexibility that supports rapidly changing data structures.

Bringing Data Science to Talent Acquisition

Working with HR departments in the Fortune 500, this company tackles the resume pile and candidate sourcing problem with data science and workforce intelligence. The company provides real-time analysis and prioritization of applicants by applying AI to thousands of information sources beyond the resume, including public and enterprise data. With predictive analytics generated by its AI algorithms, recruiters can instantly identify the most qualified candidates among active and passive applicants, accelerating the hiring process and reducing costs per hire. MongoDB was selected as the underlying database due to its data model flexibility and scale, coupled with extensive security controls to protect Personally Identifiable Information (PII).

Wrapping Up Part 4

That wraps up our 4-part blog series. Over the course of the blog series we’ve discussed how deep learning and AI have moved well beyond science fiction into the cutting edge of internet and enterprise computing. Access to more computational power in the cloud, advancement of sophisticated algorithms, and the availability of funding are unlocking new possibilities unimaginable just five years ago. But it’s the availability of new, rich data sources that is making deep learning real.

To advance the state of the art, developers and data scientists need to carefully select the underlying databases that manage the input, training, and results data. MongoDB is already helping teams realize the potential of AI.

Remember, if you want to read the entire series in one go, download the complete the complete Deep Learning and Artificial Intelligence white paper.


          Deep Learning and the Artificial Intelligence Revolution: Part 3        

Welcome to part 3 of our 4-part blog series.

  • In part 1 we looked at the history of AI, and why it is taking off now
  • In part 2, we discussed the differences between AI, Machine Learning, and Deep Learning
  • In today’s part 3, we’ll dive deeper into deep learning and evaluate key considerations when selecting a database for new projects
  • We’ll wrap up in part 4 with a discussion on why MongoDB is being used for deep learning, and provide examples of where it is being used

If you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.

What is Deep Learning?

Deep learning is a subset of machine learning that has attracted worldwide attention for its recent success solving particularly hard and large-scale problems in areas such as speech recognition, natural language processing, and image classification. Deep learning is a refinement of ANNs, which, as discussed earlier, “loosely” emulate how the human brain learns and solves problems.

Before diving into how deep learning works, it’s important to first understand how ANNs work. ANNs are made up of an interconnected group of neurons, similar to the network of neurons in the brain.

Neuron Model

Figure 1: The Neuron Model

At a simplistic level, a neuron in a neural network is a unit that receives a number of inputs (xi), performs a computation on the inputs, and then sends the output to other nodes or neurons in the network. Weights (wj), or parameters, represent the strength of the input connection and can be either positive or negative. The inputs are multiplied by the associated weights (x1w1, x2w2,..) and the neuron adds the output from all inputs. The final step is that a neuron performs a computation, or activation function. The activation function (sigmoid function is popular) allows an ANN to model complex nonlinear patterns that simpler models may not represent correctly.

Neural Network Diagram

Figure 2: Neural Network Diagram

Figure 2 represents a neural network. The first layer is called the input layer and is where features (x1, x2, x3) are input. The second layer is called the hidden layer. Any layer that is not an input or output layer is a hidden layer. Deep learning was originally coined because of the multiple levels of hidden layers. Networks typically contain more than 3 hidden layers, and in some cases more than 1,200 hidden layers.

What is the benefit of multiple hidden layers? Certain patterns may need deeper investigation that can be surfaced with the additional hidden layers. Image classification is an area where deep learning can achieve high performance on very hard visual recognition tasks – even exceeding human performance in certain areas. Let’s illustrate this point with an example of how additional hidden layers help perform facial recognition.

Deep Learning Recognition

Figure 3: Deep Learning Image Recognition

When a picture is input into a deep learning network, it is first decomposed into image pixels. The algorithm will then look for patterns of shapes at certain locations in the image. The first hidden layer might try to uncover specific facial patterns: eyes, mouth, nose, ears. Adding an additional hidden layer deconstructs the facial patterns into more granular attributes. For example, a “mouth” could be further deconstructed into “teeth”, “lips”, “gums”, etc. Adding additional hidden layers can devolve these patterns even further to recognize the subtlest nuances. The end result is that a deep learning network can break down a very complicated problem into a set of simple questions. The hidden layers are essentially a hierarchical grouping of different variables that provide a better defined relationship. Currently, most deep learning algorithms are supervised; thus, deep learning models are trained against a known truth.

How Does Training Work?

The purpose of training a deep learning model is to reduce the cost function, which is the discrepancy between the expected output and the real output. The connections between the nodes will have specific weights that need to be refined to minimize the discrepancy. By modifying the weights, we can minimize the cost function to its global minimum, which means we’ve reduced the error in our model to its lowest value. The reason deep learning is so computationally intensive is that it requires finding the correct set of weights within millions or billions of connections. This is where constant iteration is required as new sequences of weights are tested repeatedly to find the point where the cost function is at its global minimum.

Deep Learning Image Recognition

Figure 4: Deep Learning Cost Function

A common technique in deep learning is to use backpropagation gradient descent. Gradient descent emerged as an efficient mathematical optimization that works effectively with a large number of dimensions (or features) without having to perform brute force dimensionality analysis. Gradient descent works by computing a gradient (or slope) in the direction of the function global minimum based on the weights. During training, weights are first randomly assigned, and an error is calculated. Based on the error, gradient descent will then modify the weights, backpropagate the updated weights through the multiple layers, and retrain the model such that the cost function moves towards the global minimum. This continues iteratively until the cost function reaches the global minimum. There may be instances where gradient descent resolves itself at a local minimum instead of global minimum. Methods to mitigate this issue is to use a convex function or generate more randomness for the parameters.

Database Considerations with Deep Learning

Non-relational databases have played an integral role in the recent advancement of the technology enabling machine learning and deep learning. The ability to collect and store large volumes of structured and unstructured data has provided deep learning with the raw material needed to improve predictions. When building a deep learning application, there are certain considerations to keep in mind when selecting a database for management of underlying data.

Flexible Data Model. In deep learning there are three stages where data needs to be persisted – input data, training data, and results data. Deep learning is a dynamic process that typically involves significant experimentation. For example, it is not uncommon for frequent modifications to occur during the experimentation process – tuning hyperparameters, adding unstructured input data, modifying the results output – as new information and insights are uncovered. Therefore, it is important to choose a database that is built on a flexible data model, avoiding the need to perform costly schema migrations whenever data structures need to change.

Scale. One of the biggest challenges of deep learning is the time required to train a model. Deep learning models can take weeks to train – as algorithms such as gradient descent may require many iterations of matrix operations involving billions of parameters. In order to reduce training times, deep learning frameworks try to parallelize the training workload across fleets of distributed commodity servers.

There are two main ways to parallelize training: data parallelism and model parallelism.

  • Data parallelism. Splitting the data across many nodes for processing and storage, enabled by distributed systems such as Apache Spark, MongoDB, and Apache Hadoop
  • Model parallelism. Splitting the model and its associated layers across many nodes, enabled by software libraries and frameworks such as Tensorflow, Caffe, and Theano. Splitting provides parallelism, but does incur a performance cost in coordinating outputs between different nodes

In addition to the model’s training phase, another big challenge of deep learning is that the input datasets are continuously growing, which increases the number of parameters to train against. Not only does this mean that the input dataset may exceed available server memory, but it also means that the matrices involved in gradient descent can exceed the node’s memory as well. Thus, scaling out, rather than scaling up, is important as this enables the workload and associated dataset to be distributed across multiple nodes, allowing computations to be performed in parallel.

Fault Tolerance. Many deep learning algorithms use checkpointing as a way to recover training data in the event of failure. However, frequent checkpointing requires significant system overhead. An alternative is to leverage the use of multiple data replicas hosted on separate nodes. These replicas provide redundancy and data availability without the need to consume resources on the primary node of the system.

Consistency. For most deep learning algorithms it is recommended to use a strong data consistency model. With strong consistency each node in a distributed database cluster is operating on the latest copy of the data, though some algorithms, such as Stochastic Gradient Descent (SGD), can tolerate a certain degree of inconsistency. Strong consistency will provide the most accurate results, but in certain situations where faster training time is valued over accuracy, eventual consistency is acceptable. To optimize for accuracy and performance, the database should offer tunable consistency.

Wrapping Up Part 3

That wraps up the third part of our 4-part blog series. We’ll conclude in part 4 with a discussion on why MongoDB is being used for deep learning, and provide examples of where it is being used

Remember, if you want to get started right now, download the complete the complete Deep Learning and Artificial Intelligence white paper.


          Deep Learning and the Artificial Intelligence Revolution: Part 2        

Welcome to part 2 of our 4-part blog series.

  • In part 1 we looked at the history of AI, and why it is taking off now
  • In today’s part 2, we will discuss the differences between AI, Machine Learning, and Deep Learning
  • In part 3, we’ll dive deeper into deep learning and evaluate key considerations when selecting a database for new projects We’ll wrap up in part 4 with a discussion on why MongoDB is being used for deep learning, and provide examples of where it is being used

If you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.

Differences Between Artificial Intelligence, Machine Learning, and Deep Learning

In many contexts, artificial intelligence, machine learning and deep learning are used interchangeably, but in reality, machine and deep learning is a subset of AI. We can think of AI as the branch of computer science focused on building machines capable of intelligent behaviour, while machine and deep learning is the practice of using algorithms to sift through data, learn from the data, and make predictions or take autonomous actions. Therefore, instead of programming specific constraints for an algorithm to follow, the algorithm is trained using large amounts of data to give it the ability to independently learn, reason, and perform a specific task.

Figure 1: Timeline of Artificial Intelligence, Machine Learning, and Deep Learning

So what’s the difference between machine learning and deep learning? Before defining deep learning – which we’ll do in part 3, let’s dig deeper into machine learning.

Machine Learning: Supervised vs. Unsupervised

There are two main classes of machine learning approaches: supervised learning and unsupervised learning.

Supervised Learning. Currently, supervised learning is the most common type of machine learning algorithm. With supervised learning, the algorithm takes input data manually labeled by developers and analysts, using it to train the model and generate predictions. Supervised learning can be delineated into two groups: regression and classification problems.

Figure 2: Supervised Regression Example

Figure 2 demonstrates a simple regression problem. Here, there are two inputs, or features (square feet and price), that are used to generate a curve fitting line and make subsequent predictions of property price.

Figure 3: Supervised Classification Example

Figure 3 is an example of a supervised classification example. The dataset is labeled with benign and malignant tumors for breast cancer patients. The supervised classification algorithm will attempt to segment tumors into two different classifications by fitting a straight line through the data. Future data can then be classified as benign or malignant based on the straight-line classification. Classification problems result in discrete outputs, though that does not necessarily constrain the number of outputs to a fixed set. Figure 3 has only two discrete outputs, but there could be many more classifications (benign, Type 1 malignant, Type 2 malignant, etc.)

Unsupervised Learning. In our supervised learning example, labeled datasets (benign or malignant classifications) help the algorithm determine what the correct answer is. With unsupervised learning, we give the algorithm an unlabeled dataset and depend on the algorithm to uncover structures and patterns in the data.

Figure 4: Unsupervised Learning Example

In Figure 4, there is no information about what each data point represents, and so the algorithm is asked to find structure in the data independently of any supervision. Here, the unsupervised learning algorithm might determine there are two distinct clusters and make a straight-line classification between the clusters. Unsupervised learning is broadly applied in many use cases such as Google News, social network analysis, market segmentation, and astronomical analysis around galaxy formations.

Wrapping Up Part 2

That wraps up the second part of our 4-part blog series. In Part 3, we’ll dive deeper into deep learning and evaluate key considerations when selecting a database for new projects

Remember, if you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.


          Deep Learning and the Artificial Intelligence Revolution: Part 1        

Deep learning and Artificial Intelligence (AI) have moved well beyond science fiction into the cutting edge of internet and enterprise computing.

Access to more computational power in the cloud, advancement of sophisticated algorithms, and the availability of funding are unlocking new possibilities unimaginable just five years ago. But it’s the availability of new, rich data sources that is making deep learning real.

In this 4-part blog series, we are going to explore deep learning, and the role database selection plays in successfully applying deep learning to business problems:

  • In part 1 today we will look at the history of AI, and why it is taking off now
  • In part 2, we will discuss the differences between AI, Machine Learning, and Deep Learning
  • In part 3, we’ll dive deeper into deep learning and evaluate key considerations when selecting a database for new projects We’ll wrap up in part 4 with a discussion on why MongoDB is being used for deep learning, and provide examples of where it is being used

If you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.

The History of Artificial Intelligence

We are living in an era where artificial intelligence (AI) has started to scratch the surface of its true potential. Not only does AI create the possibility of disrupting industries and transforming the workplace, but it can also address some of society’s biggest challenges. Autonomous vehicles may save tens of thousands of lives, and increase mobility for the elderly and the disabled. Precision medicine may unlock tailored individual treatment that extends life. Smart buildings may help reduce carbon emissions and save energy. These are just a few of the potential benefits that AI promises, and is starting to deliver upon.

By 2018, Gartner estimates that machines will author 20% of all business content, and an expected 6 billion IoT-connected devices will be generating a deluge of data. AI will be essential to make sense of it all. No longer is AI confined to science fiction movies; artificial intelligence and machine learning are finding real world applicability and adoption.

Artificial intelligence has been a dream for many ever since Alan Turing wrote his seminal 1950 paper “Computing Machinery and Intelligence”. In Turing’s paper, he asked the fundamental question, “Can Machines Think?” and contemplated the concept of whether computers could communicate like humans. The birth of the AI field really started in the summer of 1956, when a group of researchers came together at Dartmouth College to initiate a series of research projects aimed at programming computers to behave like humans. It was at Dartmouth where the term “artificial intelligence” was first coined, and concepts from the conference crystallized to form a legitimate interdisciplinary research area.

Over the next decade, progress in AI experienced boom and bust cycles as advances with new algorithms were constrained by the limitations of contemporary technologies. In 1968, the science fiction film 2001: A Space Odyssey helped AI leave an indelible impression in mainstream consciousness when a sentient computer – HAL 9000 – uttered the famous line, “I’m sorry Dave, I’m afraid I can’t do that.” In the late 1970s, Star Wars further cemented AI in mainstream culture when a duo of artificially intelligent robots (C-3PO and R2-D2) helped save the galaxy.

But it wasn’t until the late 1990s that AI began to transition from science fiction lore into real world applicability. Beginning in 1997 with IBM’s Deep Blue chess program beating then current world champion Garry Kasparov, the late 1990s ushered in a new era of AI in which progress started to accelerate. Researchers began to focus on sub-problems of AI and harness it to solve real world applications such as image recognition and speech. Instead of trying to structure logical rules determined by the knowledge of experts, researchers started to work on how algorithms could learn the logical rules themselves. This trend helped to shift research focus into Artificial Neural Networks (ANNs). First conceptualized in the 1940s, ANNs were invented to “loosely” mimic how the human brain learns. ANNs experienced a resurgence in popularity in 1986 when the concept of backpropagation gradient descent was improved. The backpropagation method reduced the huge number of permutations needed in an ANN, and thus was a more efficient way to reduce AI training time.

Even with advances in new algorithms, neural networks still suffered from limitations with technology that had plagued their adoption over the previous decades. It wasn’t until the mid 2000s that another wave of progress in AI started to take form. In 2006, Geoffrey Hinton of the University of Toronto made a modification to ANNs, which he called deep learning (deep neural networks). Hinton added multiple layers to ANNs and mathematically optimized the results from each layer so that learning accumulated faster up the stack of layers. In 2012, Andrew Ng of Stanford University took deep learning a step further when he built a crude implementation of deep neural networks using Graphical Processing Units (GPUs). Since GPUs have a massively parallel architecture that consist of thousands of cores designed to handle multiple tasks simultaneously, Ng found that a cluster of GPUs could train a deep learning model much faster than general purpose CPUs. Rather than take weeks to generate a model with traditional CPUs, he was able to perform the same task in a day with GPUs.

Essentially, this convergence – advances in software algorithms combined with highly performant hardware – had been brewing for decades, and would usher in the rapid progress AI is currently experiencing.

Why Is AI Taking Off Now?

There are four main factors driving the adoption of AI today:

More Data. AI needs a huge amount of data to learn, and the digitization of society is providing the available raw material to fuel its advances. Big data from sources such as Internet of Things (IoT) sensors, social and mobile computing, science and academia, healthcare, and many more new applications generate data that can be used to train AI models. Not surprisingly, the companies investing most in AI – Amazon, Apple, Baidu, Google, Microsoft, Facebook – are the ones with the most data.

Cheaper Computation. In the past, even as AI algorithms improved, hardware remained a constraining factor. Recent advances in hardware and new computational models, particularly around GPUs, have accelerated the adoption of AI. GPUs gained popularity in the AI community for their ability to handle a high degree of parallel operations and perform matrix multiplications in an efficient manner – both are necessary for the iterative nature of deep learning algorithms. Subsequently, CPUs have also made advances for AI applications. Recently, Intel added new deep learning instructions to its Xeon and Xeon Phi processors to allow for better parallelization and more efficient matrix computation. This is coupled with improved tools and software frameworks from it’s software development libraries. With the adoption of AI, hardware vendors now also have the chip demand to justify and amortize the large capital costs required to develop, design, and manufacture products exclusively tailored for AI. These advancements result in better hardware designs, performance, and power usage profiles.

More Sophisticated Algorithms. Higher performance and less expensive compute also enable researchers to develop and train more advanced algorithms because they aren’t limited by the hardware constraints of the past. As a result, deep learning is now solving specific problems (e.g., speech recognition, image classification, handwriting recognition, fraud detection) with astonishing accuracy, and more advanced algorithms continue to advance the state of the art in AI.

Broader Investment. Over the past decades, AI research and development was primarily limited to universities and research institutions. Lack of funding combined with the sheer difficulty of the problems associated with AI resulted in minimal progress. Today, AI investment is no longer confined to university laboratories, but is pervasive in many areas – government, venture capital-backed startups, internet giants, and large enterprises across every industry sector.

Wrapping Up Part 1

That wraps up the first part of our 4-part blog series. In Part 2, we discuss the differences between AI, Machine Learning, and Deep Learning

Remember, if you want to get started right now, download the complete Deep Learning and Artificial Intelligence white paper.


          Fraude digital y cibercrimen ¿Estamos perdiendo la guerra?        

“Una cosa nunca es completa en sí misma, sino en relación con lo que le falta”.
Jacques Derridá.
Introducción
En una realidad donde se aceleran los cambios tecnológicos y las tendencias y expectativas se vuelven volátiles e inciertas, las organizaciones se encuentran en una carrera incesante por mantener su posición privilegiada en un segmento de mercado, tratando de “sensar y responder” (Bradley y Nolan, 1998) primero que otros o buscando nuevos horizontes para conquistar “tierras inexploradas”, asumiendo los riesgos que este ejercicio conlleva, como quiera que no hay un mapa concreto del territorio y su construcción tomará tiempo y posiblemente muchas lecciones por aprender (Calvo, 2016).

En este escenario, las organizaciones criminales ha sabido capitalizar rápidamente las condiciones cambiantes del entorno, su capacidad para detectar anticipadamente las nuevas variantes de sus “negocios”, han permitido una evolución rápida adaptación que les permite una movilidad y agilidad, que desconcierta a muchos entes de policía judicial en el mundo.

Esta condición de ductilidad frente a la incertidumbre, hace que las redes delincuenciales, sean capaces de enfrentar la inestabilidad que supone navegar sobre algo que no conocen, mucha veces con temeridad y osadía, para concretar luego estrategias más concretas que los lleven a realizar con mayor tranquilidad, e incluso invisibilidad, sus acciones contrarias al ordenamiento jurídico nacional e internacional.

Frente a esta realidad, la comunidad internacional viene aumentando su capacidad de monitoreo y detección con el fin de leer con mayor claridad las tendencias que las actividades de estos “facinerosos” generan a fin de establecer escenarios que les permitan actuar frente al marco legal y así dar cuenta de los resultados de dichas acciones al margen de la ley.

Esta lucha asimétrica planteada entre “policías y ladrones”, gravita sobre un modelo causa-efecto que asiste las reflexiones de aquellos que generan políticas públicas al respecto. Un paradigma mecánico que se concentra exclusivamente en las prácticas reconocidas y los marcos validados que permiten cierto margen de acción, que supone un contexto conocido y donde el Estado en su función preponderante tiene la capacidad de influir, disciplinar y castigar.

Así las cosas, cuando el marco de acción del analista o de los cuerpos de acción policial se mantienen bajo los paradigmas conocidos y probados, pocas oportunidades para pensar diferente se van a plantear y las propuestas o soluciones que se generen estarán rodeadas de las mismas condiciones que los estándares sugieren. Por lo tanto, su capacidad de “sorprenderse con la realidad”, de “pensar en el margen de las hojas” quedará limitada, abriendo espacios para que los delincuentes capturen “mayor valor” en sus acciones, creando la inestabilidad que compromete la confianza de los ciudadanos.

En consecuencia, la guerra que se libra a nivel internacional frente a la delincuencia y el fraude, ahora en el contexto digital, requiere una revisión conceptual, habida cuenta que los métodos y técnicas que los “amigos de lo ajeno” desarrollan, no solo llevan implicaciones de comportamiento y conocimiento concreto de la realidad que quieren conquistar, sino la apertura y capacidad de reinventarse en cada instante para lograr el factor sorpresa que destruye la zona cómoda de los analista y revela la limitada capacidad de anticipar, requerida en esta nueva realidad digital, por parte de los entes policiales y organismos multilaterales que asisten estas actividades.

Observar el sistema, no es entender el sistema
Cada vez que una analista de fraude o un investigador policial se enfrenta al reto de la delincuencia transnacional y digital, lo hace desde sus conocimientos y reflexiones previas, un ejercicio que recaba en sus supuestos propios de la realidad, los cuales son resultados de sus procesos internos que usa para construir su percepción o cognición particular (Vanderstraeten, 2001).

Por lo tanto, la capacidad de observación y distinción de rarezas, inconsistencias y contradicciones (Charan, 2015) que debe desarrollar un “agente de la ley y el orden” en un escenario asimétrico como el actual, supone mantener una visión ampliada de su realidad, que implica cuestionar sus supuestos de base, para quebrar sus lentes actuales con los cuales se enfrenta al mundo y así tener mayor oportunidad para ver lo que los “bandidos digitales” pueden llegar a ver.

Muchas veces los entrenamientos y capacitaciones sobre seguridad y control, que generalmente están fundados en currículos establecidos, competencias requeridas y didácticas de repetición y memorización (Cano, 2016), a los cuales asisten los analistas de fraude y de la delincuencia digital, establecen un marco de actuación que permiten una participación conocida y estándar de estos profesionales, que da cuenta de las actividades naturales y propias de los procedimientos criminalísticos.

En consecuencia, la necesidad de estar ajustados a un protocolo particular y al mismo tiempo comprender la inestabilidad que provoca la acción criminal, enfrenta a los profesionales antifraude y entes del estado, a un dilema de acción que compromete su margen de actuación habida cuenta que, su sesgo particular de orden y estructura, entra en tensión con la entropía, volatilidad y ambigüedad que subyace en una actividad criminal, la cual buscarán encuadrar dentro de los patrones de razonamiento estructurales que estos agentes de la ley tienen en su formación.

Lo anterior, demanda desarrollar un cambio de aproximación conceptual y cognitiva, que invite no a observar la acción criminal como algo puntal con sus resultados, sino a construir y revelar el sistema que lo contiene. Esto es, establecer las redes que conectan los hechos, lo que necesariamente demanda superar la vista lineal de una investigación, para armonizar los contrarios inherentes a las propuestas de los criminales: lo regular y lo irregular, lo sincrónico y lo asincrónico, lo ofensivo y lo defensivo, lo global y lo local.

Esta aproximación, que si bien reta los procedimientos actuales de los agentes del orden, establece una posibilidad de actuación enriquecida como quiera que no es solamente conocer los alcances de una acción delictiva digital, sino entender y develar el flujo que se genera entre la legalidad y la ilegalidad, como una vista extendida del actuar del delincuente que expone las distinciones y detalles que previamente ha elaborado para concretar su acción contraria al orden.

Disuadir y enfrentar, distinciones complementarias en la lucha contra el crimen y el fraude digital
Si se logra concretar el entendimiento de la armonía de los contrarios en el actual de los profesionales antifraude y especialistas en crimen digital, es posible cambiar las acciones que se emprenden para comprender y anticipar las propuestas de los bandidos en un entorno digitalmente modificado.

El ciberespacio, como creación humana y maleable, está en constante cambio y requiere de mentes abiertas para poder observar las posibilidades que se pueden plantear tanto para movilizar ideas novedosas, como para consolidar conductas abiertamente contrarias a la ley (Fischerkeller y Harknett, 2017). Este escenario, ausente de gobernabilidad central y resiliente a situaciones adversas, funda un entorno natural para que aquellos con mente disruptiva, pasión y conocimiento establezcan reglas novedosas que se contagien y creen tendencias que muchos no fueron capaz de movilizar.

Si lo anterior es correcto, las tendencias de la cibercriminalidad y el fraude abundan en acciones estratégicamente dirigidas y algunas veces inesperadamente logradas, donde el sabotaje, el espionaje y la corrupción (ídem) son parte del discurso que estos conglomerados delincuenciales configuran, para crear escenarios que comprometan la estabilidad de la sociedad y creen el incierto que destruye la confianza de los ciudadanos respecto de sus posibilidades en un entorno como el ciberespacio.

Con el sabotaje, se concretan acciones que debilitan o destruyen los logros económicos y afectan la infraestructura clave de las organizaciones o naciones. Con el espionajeincursionan dentro de los linderos de las empresas o naciones para extraer información sensible para desarrollar sus acciones criminales y con la corrupción debilitan la autoridad y el buen juicio sobre las decisiones, capturando la soberanía de la acción empresarial o nacional, despejando el terreno para actuar con mayor libertad y menos supervisión.

Si entendemos que el escenario de actuación de los criminales no puede ser objetivamente representado dentro del contexto social (Vanderstraeten, 2001)  y que por lo tanto, cada analista o profesional antifraude o especialista en criminalidad digital no puede ser entrenado para distinguir con claridad estas actuaciones, es claro que los sólo podemos observar y distinguir tanto como la capacidad de comprensión colectiva que podamos construir. Esto es, desarrollar una ventaja estratégica superior que disuada a los contrarios en su propio terreno y deconstruya la acción de la fuerza y el control estándar, frente a lo que esperan los delincuentes.

La disuasión como estrategia de lucha contra la delincuencia establece un referente práctico que debe ser creíble y validado por el escenario social donde se construye. Disuadir al atacante informático o a un defraudador empresarial, requiere crear un entorno de imaginarios sociales reforzados desde las creencias, valores y actitudes, que confirmen que la organización o la nación conocen sus métodos y sus acciones opacas, por lo cual cualquier movimiento o sugerencia en este sentido tendrá un reflector que alerte sobre aquel actuar que puede motivar una acción contraria que deteriore la confianza imperfecta (Cano, 2016b).

Es claro que la delincuencia cuenta con recursos ilimitados para crear contextos de contrainteligencia que son capaces de envolver a los investigadores forenses o analistas de fraude más expertos, para confundirlos y llevarlos fuera de su alcance, sin embargo, en la medida que el tejido social se haga más resistente a las sugerencias de la delincuencia, habrá menos espacio para concretar labores tan elaboradas como operaciones totalmente normales y lícitas, que ocultan una estrategia de corrupción que pasa desapercibida frente al más escéptico de los profesionales antifraude o especialista en crimen digital, sin que los controles vigentes se enteren de dicha transacción.

Por tanto, la disuasión combinada con una estrategia de controles internos debidamente probados y articulados en los puntos de mayor riesgo (ver figura 1), establece un continuo de monitoreo y revisión que define patrones y condiciones que se pueden cambiar frente al posible infractor, como quiera que los controles no van a ser estáticos, así como sus niveles de sensibilidad para generar las alertas. Un control dinámico genera mayor incertidumbre para el agresor.


Figura 1. Disuadir y enfrentar. Conceptos complementarios

Entre mayor inestabilidad pueda generar el sistema de seguridad y control, frente a la forma, sensibilidad y alcance de sus acciones, esto es, ajustes dinámicos de fuentes de verificación, inclusión de observadores de disciplinas distintas, cambios de patrones en la validación y control previsto y un permanente aprendizaje/desprendizaje de las tendencias de los comportamientos de las transacciones y las personas, mayor será la variedad que los analistas van a tener para comprender los siguientes movimientos de los atacantes o defraudadores.

En este sentido, los avances tecnológicos establecen alternativas de interés basadas en algoritmos de aprendizaje profundo (Marr, 2016), que ya no solamente correlacionan información, sino que dan pautas y pistas de siguientes movimientos, con el fin de despertar la imaginación de los analistas y especialistas en fraude y crimen digital, para entrar en el mismo territorio de los atacantes y delincuentes, donde es posible observar y distinguir posibilidades de acción más que probabilidades de éxito de las mismas.

Amén de lo anterior, el disuadir y el enfrentar son parte del continuo de opciones que los analistas y entes de policía judicial deben comprender, pues al final del día no es doblegar al adversario lo que se requiere, es concretar una posición privilegiada en el mismo entorno donde este opera, para poder actuar de forma efectiva, es decir, disuadirlo de la acción que planea o ejecuta, identificando y superando las causas raíces que motivan y habilitan dicho actuar.

Reflexiones finales
Cuando observamos los esfuerzos en la lucha contra el fraude y la delincuencia digital desde el paradigma causa-efecto, la sensación que se obtiene es que estamos perdiendo la guerra y que el enemigo cada vez se fortalece y mejora sus técnicas para sorprender a la sociedad de formas inesperadas.

Sin embargo, cada vez más los entes de policía judicial comprenden que en un escenario de confrontación donde las capacidades del enemigo no se conocen, donde este puede mimetizarse de formas amigables e inciertas, incluso a la vista de los mismos especialistas, se hace necesario superar la vista mecanicista del entendimiento de la delincuencia y el fraude en el contexto digital y migrar hacia un entendimiento más relacional que ofrezca pistas sobre el escenario donde actúan y crean sus propios modelos.

En consecuencia, crear una estrategia de disuasión y control que no responda a un parámetro determinado, sino a una evolución de “sensar y responder”, que habilite una rápida adaptación de los saberes previos de los analistas y especialistas en fraude y crímenes digitales, es una exigencia propia del contexto actual, habida cuenta que la inestabilidad del territorio donde opera ahora la delincuencia, exige mayores niveles de anticipación y acción que balancee el tablero de operaciones entre los participantes: policías y ladrones.

Para ello, la información se convierte en un activo estratégico (Bebber, 2017) para confrontar aquello que se conoce y crear marcos de actuación que anticipen los movimientos de la criminalidad, y así tratar de sorprenderla en su propio territorio, superando el enfrentamiento estéril y desgastador entre buenos y malos.

El reto por tanto consiste en armonizar las posturas inestables de los asaltantes y estafadores digitales, dentro de escenarios prospectivos y disruptivos que se puedan crear con los nuevos adelantos tecnológicos, que permitan ver de forma distinta la evolución de una confrontación que continúa desde la antigüedad, donde el forajido es capaz de pensar distinto y sin restricciones para llevar a cabo sus acciones criminales, y el analista o agente del orden, sólo puede actuar dentro de los cánones de que le dicta el ordenamiento jurídico establecido.

Así las cosas, entender este enfrentamiento irregular, inestable, asincrónico, ofensivo y asimétrico donde los medios se convierten en los fines, demanda demarcar un nuevo terreno de análisis y acción, donde los observadores y agentes (analistas y delincuentes) son capaces de reinterpretar sus propias actuaciones de forma independiente, con el fin de mantener un mínimo de paranoia bien administrada como soporte fundamental de la confianza imperfecta que cada empleado y ciudadano asume, al ser partícipe de una realidad volátil, incierta, compleja y ambigua.

Referencias
Bebber, R. (2017) Treating information as a strategic resource to win the “information war”. Orbis. Foreign Policy Research Institute. Summer. Doi: 10.1016/j.orbis.2017.05.007. 394-403
Bradley, S. y Nolan, R. (Eds) (1998) Sense and respond: capturing value in the network era. USA: Harvard Business School Press.
Calvo, C. (2016) Del mapa escolar al territorio educativo. Disoñando la escuela desde la educación. La Serena, Chile: Editorial Universidad de la Serena.
Cano, J. (2016) La educación en seguridad de la información. Reflexiones pedagógicas desde el pensamiento de sistemas. Memorias 3er Simposio Internacional en “Temas y problemas de Investigación en Educación: Complejidad y Escenarios para la Paz”. Universidad Santo Tomás. Bogotá, Colombia. Recuperado de: http://soda.ustadistancia.edu.co/enlinea/congreso/congresoedu/2%20Pedagogia%20y%20dida%B4ctica/2%209%20LA%20EDUCACION%20EN%20SEGURIDAD%20DE%20LA%20INFORMACION.pdf
Cano, J. (2016b) Protección de la información. Un ejercicio de confianza imperfecta. Blog IT-Insecurity. Recuperado de: http://insecurityit.blogspot.com.co/2016/09/proteccion-de-la-informacion-un.html
Charan, R. (2015) The attacker’s advantage. Turning uncertainty into breakthrough opportunities. New York, USA: Perseus Books Groups.
Fischerkeller, M. y Harknett, R. (2017) Deterrence is not credible strategy for cyberspace. Orbis. Foreign Policy Research Institute. Summer. Doi: 10.1016/j.orbis.2017.05.003. 381-393
Marr, B. (2016) What Is The Difference Between Deep Learning, Machine Learning and AI? Forbes. Recuperado de: https://www.forbes.com/sites/bernardmarr/2016/12/08/what-is-the-difference-between-deep-learning-machine-learning-and-ai
Vanderstraeten, R. (2001) Observing systems: a cybernetic perspective on system/environmental relations. Journal for theory of social behavior. 31, 3. 297-311.

          í´ë¼ìš°ë“œì— 딱 맞는 MXNet의 5가지 딥러닝 학습 기능        

Apache MXNet  (인큐베이팅 프로젝트)는 최첨단 딥러닝(Deep Learning) 학습 모델 제작을 지원하는 확장 성이 뛰어난 오픈 소스 프레임 워크입니다. 이를 통해 CNN (Convolutional Neural Network), LSTM (Long Term Memory Network) 등을 만들 수 있고, Python, Scala, R 및 Julia를 포함한 다양한 언어를 지원합니다.

이 글에서는 MXNet이 AWS 클라우드 개발자 친화적인 프레임워크로서 자리 매김하는 몇 가지 독특한 기능을 소개합니다.  Python에서 MXNet을 사용하여 신경망 코딩을 하는 분을 위한 한 장짜리 기능 요약집도 하단에 있으니, 많이 참고해 보시기 바랍니다.

#1  코드 몇 줄로 다중 GPU 학습 지원
다중 GPU 기반 학습 실행 기능은 MXNet 아키텍처의 핵심 부분입니다. 모델을 학습시키려는 장치 목록을 전달하면 됩니다. 기본적으로 MXNet은 데이터 병렬 처리를 사용하여 여러 GPU에서 작업 부하를 분할합니다. 예를 들어, GPU가 3 개인 경우 각 모델은 전체 모델 사본을 받고 각 데이터 배치(Batch)의 1/3로 나눠 학습을 진행합니다.

import mxnet as mx 
# Single GPU
module = mx.module.Module(context=mx.gpu(0))
# Train on multiple GPUs
module = mx.module.Module(context=[mx.gpu(i) for i in range(N)], ...)

MXNet은 다중 GPU 혹은 다중 서버 기반 학습에서 가장 뛰어난 효율을 보이고 있습니다.

#2 다중 서버 기반 학습 가능
MXNet은  다중 서버에서 여러 GPU에 대한 학습 또한 간소화하도록 설계한 분산형 딥러닝 학습 프레임 워크입니다. 서버 클러스터 전체에서 학습을 하려면, 모든 컴퓨터에 MXNet을 설치하고 SSH를 통해 서로 통신 할 수 있는지 확인한 다음 서버 IP가 포함 된 파일을 만들어야 합니다.

$ cat hosts 
192.30.0.172 
192.30.0.171
$ python ../../tools/launch.py -n 2 --launcher ssh -H hosts python train_mnist.py --network lenet --kv-store dist_sync

MXNet은 키-밸류 스토어를 사용하여 서버 간의 그라디언트와 파라미터를 동기화할 수 있습다. 이를 통해 분산 학습을 수행 할 수 있으며, 본 기능은 USE_DIST_KVSTORE = 1을 사용하여 MXNet을 새로 컴파일하면 됩니다.

#3 데이터는 Amazon S3!
MXNet에서 데이터 반복자(iterators)는 Python iterator 객체와 비슷합니다. 단, 해당 레이블과 함께 “n”개의 학습 예제가 포함 된 DataBatch 객체로 데이터 배치(batch)를 반환한다는 점이 다릅니다. MXNet에는 NDArray 및 CSV와 같은 공통 데이터 유형에 대해 미리 작성된 효율적인 데이터 반복자를 가지고 있습니다. 또한, HDFS와 같은 분산 파일 시스템에서 효율적인 I/O를 위해 바이너리 형식을 사용하기도 합니다. mx.io.DataIter 클래스를 확장하여 사용자 정의 데이터 반복기를 만들 수 있습니다. 이 기능을 구현하는 방법에 대한 자세한 내용은 기본 튜토리얼을 참조하시면 됩니다 .

특히, Amazon S3 (Amazon Simple Storage Service)는 대량의 데이터를 매우 저렴한 비용으로 저장해야 하는 고객에게 유용합니다. MXNet에서는 데이터를 디스크에 직접 다운로드 할 필요 없이 RecordIO, ImageRecordIO, CSV 또는 NDArray 형식의 Amazon S3에 저장된 데이터를 참조하는 반복자를 만들 수 있습니다.

data_iter = mx.io.ImageRecordIter(     
     path_imgrec="s3://bucket-name/training-data/caltech_train.rec",
     data_shape=(3, 227, 227),
     batch_size=4,
     resize=256)

# 4 신경망 시각화 기능
MXNet에서는 신경망 아키텍처를 시각화 할 수 있도록 Graphviz와 통합되어 있습니다. 네트워크 시각화를 생성하려면, node_atters 속성으로 정의한 대로 네트워크의 모양과 함께 네트워크의 마지막 레이어를 참조하는 심볼을 사용 합니다. 아래 예제는 LeNet 표준 CNN 을 시각화하는 방법을 보여줍니다 .

mx.viz.plot_network(symbol=lenet, shape=shape)

자세한 코드 및 구현 지침은이 자습서를 참조하십시오 .

#5 프로파일 러 지원
MXNet에는 USE_PROFILER = 1 플래그를 통해 사용 가능한 내장 프로파일러가 있습니다 . 이를 통해, 네트워크(심볼 수준) 실행 시간을 계층 별로 분류할 수 있습니다. 이 기능은 일반적인 프로파일링 도구인  nvprof  및  gprof을  보완하며, 함수, 커널, 또는 학습 수준에서, 운영자 수준에서 처리할 수 있게 합니다. 환경 변수를 사용하여 전체 Python 프로그램에 대해 활성화 할 수 있습니다 . 또는 아래와 같이 프로그램의 하위 집합에 코드를 통합하여 활성화 할 수 있습니다.

mx.profiler.profiler_set_config(mode='all', filename='output.json')     
mx.profiler.profiler_set_state('run')      
# Code to be profiled goes here...      
mx.profiler.profiler_set_state('stop')

프로파일링 출력을 Chrome과 같은 웹 브라우저에 로드하고 다음과 같이 브라우저의 추적 ( Chrome 브라우저에서 chrome://tracing)으로 이동하여 프로필을 볼 수 있습니다.

위 스크린 샷은 프로파일링 도구를 사용하여 MXNet에 구현 된 원래의 LeNet 아키텍처로 MNIST 데이터 세트를 학습하는 프로파일을 보여줍니다 .

One more thing: MXNet CheetSheet
이제 MXNet의 고유한 기능을 기반으로 신경망 학습을 시작하는데 아래 치트 시트가 도움이 될 것입니다. 여기에는 CNN, RNN/LSTM, 선형 회귀 및 로지스틱 회귀에 대한 몇 가지 일반적인 아키텍처가 포함되어 있습니다. 이를 사용하여, 데이터 반복자 및 Amazon S3 반복기를 작성하고 체크 포인트를 구현하며 모델 파일을 저장하는 방법에 대한 간단한 코드가 있습니다.

Apache MXNet 치트 시트

확대하려면 클릭하십시오.

MXNet 커뮤니티는 Gluon 이라는 동적인 사용하기 쉬운 명령형 인터페이스를 지원하기 시작했고, MXNet으로 딥러닝을 시작하려면 튜토리얼을 참조하십시오 .

이 글은 Sunil Mallya이 쓴 Exploiting the Unique Features of the Apache MXNet Deep Learning Framework with a Cheat Sheet의 한국어 번역입니다.

필수 참고 자료


          MXNet 기반 추천 오픈 소스 딥러닝 프로젝트 모음        

Apache MXNet 은 일반 개발자가 손쉽게 딥러닝(Deep Learning) 모델을 구축, 학습 및 실행하는 데 도움을 주는 오픈 소스 라이브러리입니다. 이전 시리즈에서 MXNet API 및 주요 기능, 활용 방법에 대해 소개했습니다.

이 글에서는 MXNet을 다양한 유스 케이스에 적용하는 특징적인 오픈 소스 프로젝트를 소개합니다. (참고로 MXNet Model Zoo에는 다양한 주요 딥러닝 학습 모델 사례가 있으니, 먼저 살펴 보시기 바랍니다!)

#1 — 이미지 객체 인식
이 프로젝트는 하나의 이미지에서 여러개의 객체를 탐지하는  것으로 mxnet-ssd (논문 링크)라는 프로젝트를 개량한 것으로 MXNet의 특징이라고 할 수 있는, 멀티 GPU에서 성능을 향상 시킨 것입니다.

precedenceguo/mx-rcnn
mx-rcnn – Faster R-CNN, an MXNet implementation with distributed implementation and data parallelization

이 프로젝트는 아래 연구 결과를 기반으로 합니다.

#2 — 스마트폰용 이미지 분석 프로젝트
MXNet 입문 마지막 가이드에서 살펴본 대로,  Inception v3 을 사용하면, 모바일 기기에서도 실시간으로 이미지 분석이 가능합니다.

아래 프로젝트는 안드로이드 및 iOS에ㅓ 사용할 수 있는 이미지 인식 프로젝트입니다.

dneprDroid/ImageRecognizer-iOS

dneprDroid/ImageRecognizer-Android

#3 — 얼굴 인식 및 안면 감지 기능
이 프로젝트는 Amazon Rekognition의 얼굴 인식과 유사한 기능을 제공합니다. 좀 더 자세한 구현을 하고 싶은 경우, 좋은 출발점이 될 수 있습니다.

tornadomeet/mxnet-face

이 프로젝트는 아래 연구 결과를 기반으로 합니다.

#4— 자동차 번호판 인식하기

이 프로젝트는 81 % 정확도로 MacBook Pro에서 초당 9 매의 번호판 인식을 수행할 수 있습니다. 약간의 노력을 더 한다면 다른 문자 인식 사용 사례에 적용할 수 있습니다 🙂

szad670401/end-to-end-for-chinese-plate-recognition

#5 — Sockeye : 기계 번역 프로젝트

Sockeye 프로젝트는 MXNet에 기반한 신경망 기계 번역(Neural Machine Translation)을 위한 시퀀스-시퀀스(sequence-to-sequence) 프레임 워크입니다.  AWS에서 개발하고 있으며, 더 자세한 것은 MXNet 기반 Sockeye를 통한 기계 번역 학습 해보기를 참고하시기 바랍니다.

awslabs/sockeye
sockeye – Sequence-to-sequence framework with a focus on Neural Machine Translation based on MXNet

AWS  기반 배포 방법

지금까지 다른 Python 애플리케이션과 마찬가지로 Amazon EC2 인스턴스에서 MXNet 코드를 실행했습니다. AWS에서 애플리케이션을 실행할 수 있는 대체 방법(콘테이너 및 서버리스)이 있으면, 이는 MXNet에 적용할 수 있겠죠.

#6— Amazon ECS와 코드 도구를 통한 MXNet API 지속적 배포 방식

이 프로젝트는 AWS CloudFormation 템플릿을 사용하여 MXNet 모델 또는 애플리케이션 코드의 변경 사항을 파이프 라인을 통해 배포, 구성 및 조율하는 자동화 된 워크 플로우를 생성할 수 있습니다. CodePipeline과 CodeBuild를 사용하여 지속적 전달(CD) 방식이 가능하고,  몇 분 만에 사용자가 사용할 수 있습니다.

#7 —  MXNet Lambda 함수로 배포 하기

AWS Lambda를 통해 MXNet을 사용해 미리 학습된 모델을 통해 이미지 인식 등을 해 볼 수 있는 프로젝트입니다. Serverless Application Model (SAM) 템플릿을 통해 서버리스 API 엔드포인트도 자동으로 구현합니다.

지금까지 다양한 MXNet 기반의 추천 오픈 소스 프로젝트를 살펴 보았습니다. 혹시 더 추천해 주실만한 프로젝트가 있으면 알려주세요!

연재 순서


          Apache MXNet에 대한 모든 것!        

아마존의 CTO인 Werner Vogels 박사는 MXNet – Deep Learning Framework of Choice at AWS라는 글에서 확장 능력, 개발 속도, 이동성 등의 다양한 요인을 비추어 볼 때, MXNet이 가장 좋은 인공 지능 애플리케이션 개발을 위한 딥러닝 프레임웍이라고 판단하고, 이를 기반한 딥러닝 서비스 개발 지원 및 오픈 소스 지원에 대한 의지를 피력한 바 있습니다.

이 글은 다양한 오픈 소스 딥러닝 프레임웍 중에 아마존이 선택한 Apache MXNet에 관한 다양한 한국어 자료들을 모아서 제공하는 것을 목적으로 합니다.

Apache MxNet은 개발자들에게 친숙한 심볼릭(Symbolic)과 명령형(imperative) 프로그래밍의 혼합 방식을 지원할 뿐만 아니라 CPU와 GPU 연산을 지원하고, 특히 GPU 클러스터에 최적화된 엔진을 사용해서 성능이 뛰어납니다.

또한, 실무적으로 많이 사용하는 Python, C++, R, Scala, Julia, Matlab, and JavaScript을 지원하고, 모바일 기기 부터 서버까지 다양한 디바이스를 지원하여 산업계에서 응용하기에 매우 적합한 딥러닝 프레임워크입니다.

Apache MXNet 입문 가이드
이 시리즈는 AWS 테크 에반젤리스트인 Julien Simon이 연재한 MXNet 관련 글 모음의 번역 편집본으로 최근 각광 받고 있는 Deep Learning 라이브러리인 Apache MXnet을 개괄적으로 설명하려고 합니다.

이 글은 간단한 코드를 이해하는 개발자라면 기계 학습과 인공 지능을 잘 알지 못하는 분이라도 쉽게 따라올 수 있도록 했습니다. 너무 겁먹지 않으셔도 됩니다.

동영상
모두를 위한 딥러닝 강의로 유명한 홍콩과기대 김성훈 교수와 MXNet 코드 개발자인 Xingjian Shi가 함께 MXNet의 장점과 함께 간단한 딥러닝 학습 문제를 데모로 보여 드립니다. (슬라이드)  모두를 위한 딥러닝을 청취하신 분들이라면, Lab 강의에 대한 MXNet 소스 코드를 참고하셔도 됩니다!

Apache MXNet에 대해 간단하게 소개하고, AWS에서 Deep Learning AMI을 이용하여 Amazon EC2 인스턴스에서 MXNet을 구동하고, 테스트하는 방법을 살펴 봅니다. 또한, 분산 딥러닝 클러스터 생성 템플릿으로 멀티 GPU에서 구동하는 방법도 살펴 볼 수 있습니다.

사용자 모임

Apache MXNet에 대한 관심이 늘어나고 있고, 배우려는 분들과 질문/답변을 할 수 있도록 페이스북에 그룹을 만들었습니다. 관심 있는 분들 참여해 주시길…

더 자세한  정보

더 자세한 것은 AWS 블로그 MXNet 카테고리를 참고하셔도 됩니다.

앞으로 이 글에는 MXNet 기반 모델 학습 시리즈 한국어 번역 및 Amazon AI 블로그의 MXNet 관련 글 모음 등 다양한 정보를 소개할 예정입니다.


          [ZDNet 칼럼] 대용량 인공지능 플랫폼을 개발자들에게        

아마존은 사업 초기부터 인공 지능에 투자해 왔다. 아마존닷컴의 초창기 홈페이지를 보면 ‘Eyes & Editors’라는 기능이 있었는데, 이는 좋아하는 저자의 신규 서적에 대해 자동 검색 및 알림을 해 주는 에이전트 기반 서적 추천 엔진이다. 이미 2006년에 이러한 사용자 리뷰 및 행동 기반 추천을 통해 총 판매액의 35%가 추천 시스템에서 발생했다고 한다.

최근에는 머신 러닝 및 딥러닝 기법을 물류센터에 도입하기도 했다. 사용자가 물건을 온라인 장바구니에 담기만 해도 주문자의 위치, 상품의 위치와 포장 및 운송 경로를 자동으로 예측하여, ‘고객이 주문 전에 배송 계획 예측’하는 시스템을 운용하고 있다. 매 주간 총 500억회 이상 기계 학습을 기반한 예측을 하고 있다.

아마존닷컴 초창기 첫화면(출처: Internet Archive)
아마존닷컴 초창기 첫화면(출처: Internet Archive)

이러한 예측을 기반으로 전 세계 아마존의 물류 센터 중 13 곳에는 시범적으로 키바(KIVA)라는 무인 로봇을 도입했다. 이 로봇은 배송 물품을 자동으로 계산하고 운반해서 포장하는 직원 앞에 순차적으로 놓아 준다. 그 결과 기존 1시간 이상 걸리던 물류 순환 속도를 15분으로 단축하고, 재고 공간 50% 향상, 운영 비용 20% 개선의 효과를 거두었다.

아마존 창고를 책임지는 로봇 짐꾼 '키바'(왼쪽, 출처: CNet Korea), 아마존 물류창고의 AI 분석용 공개 데이터(오른쪽, 출처: AWS 홈페이지)
아마존 창고를 책임지는 로봇 짐꾼 ‘키바'(왼쪽, 출처: CNet Korea), 아마존 물류창고의 AI 분석용 공개 데이터(오른쪽, 출처: AWS 홈페이지)

재미있는 점은 아마존 물류 센터에 상품을 보관하는 선반에는 크고 작은 다양한 물건이 무작위로 놓여져 있다. 사람이 직접 배송 물품을 포장하기 전에, 로봇으로 인해 예측된 물품이 옮겨지게 되는데, 이 때 물품 재고 및 내역 파악을 위해 컴퓨터 비전 기술과 함께 딥러닝을 통한 이미지 모델링 분석을 통해 상품의 배열 방식이 바뀌거나 이동하는 등 다양한 외부 요인에 상관 없이 재고 파악을 할 수 있다. 딥러닝 연구자를 위해 아마존 S3 공공 데이터에 선반 속 재고 상품 이미지 세트를 무료로 공개하기도 했다.

아마존은 최근에 ‘아마존 ê³ ’라는 새로운 형태의 무인 결제 오프라인 상점을 선보이기도 했다. 줄을 서서 기다릴 필요가 없는 ‘저스트 워크아웃(Just Walk Out)’이라는 기술을 통해 모바일 앱을 사용하여 상점에 입장, 원하는 제품을 선택하면 바로 가상 장바구니에 담기고, 상점을 나설 때 자동으로 결제가 되는 것이다. 상점내 각종 센서를 통해 컴퓨터 비전, 센서 퓨전 및 딥러닝과 같은 자율 주행 차량에 사용되는 것과 동일한 유형의 기술이 활용된다.

■ 개발자를 위한 머신 러닝 서비스 출시

이러한 내부적 기술적 토대를 기반으로 AWS는 2015년 4월 ‘아마존 머신러닝’ 서비스 공개 이후, AWS 클라우드를 사용하는 고객들의 요구에 맞게끔 다양한 인공 지능을 위한 플랫폼 옵션을 공개해왔다. AWS는 대규모 자원을 가지고 있거나 투자 여력이 있는 회사만이 할 수 있는 인프라나 플랫폼을, 누구나 이용할 수 있도록 하게 하는 목표를 갖고 있다. 인공 지능 분야도 예외가 아니다.

아마존 머신러닝 서비스는 기계 학습 전문 지식은 별로 없더라도 도메인 지식(혹은 대용량 데이터)을 보유한 개발자들이 사용할 수 있는 예측 분석을 제공하고 있다. 이는 아마존에서 내부적으로 사용하는 것과 동일한 기술로서, 모든 사용자가 AWS에서 바로 사용할 수 있다. 실제 많은 AWS 고객이 각종 위조 탐지, 쇼핑 분석 등에 이를 널리 활용하고 있다.

아마존 머신러닝 콘솔을 통한 모델 훈련 및 예측(출처:AWS홈페이지)
아마존 머신러닝 콘솔을 통한 모델 훈련 및 예측(출처:AWS홈페이지)

예를 들어, Hudl사는 스포츠 경기 데이터 및 비디오 분석 및 예측을 통해 코치와 운동선수가 경기를 준비하는데 도움을 받고 있다. Upserve는 식당 관리 시스템을 제공하는데, 아마존 머신러닝을 활용하고 있다.

“아마존 머신 러닝을 통해 저녁 시간대에 레스토랑에 방문할 전체 고객 수를 예측할 수 있게 되었다. ê·¸ ê²°ê³¼ 레스토랑은 저녁 시간대 직원 배치를 효과적으로 계획해 서빙 할 수 있다고 Upserve의 브라이트 풀턴, 인프라 엔지니어링 담당이사는 전한다.”

뿐만 아니라, 스타트업인 AdiMap은 광고 데이터 분석을 통해 그 결과를 예측하고 있으며, 크라우드소싱 사기 탐지 플랫폼인 Fraud.net은 복잡성을 줄이고 새롭게 등장하는 사기 패턴을 이해하기 위해 아마존 머신러닝을 사용하고 있다. AWS는 작년 AWS 리인벤트 행사에서 애플리케이션 개발자도 막대한 투자를 요하는 인공 지능 기능을 API 형태로 활용할 수 있는 새로운 AI 서비스를 출시했다. 또한, 지금은 AI 서비스의 초기이며, 좀 더 전문적인 결과를 얻으려는 고객의 피드백에 따라 아마존 AI 플랫폼에서는 알고리즘 튜닝과 모델 트레이닝이라는 두 가지 축을 따라 유연한 플랫폼을 확장하고 있다.

■ 확장성 높은 고성능 딥러닝 플랫폼 활용

​일부 인공 지능 연구자 또는 데이터 과학자들은 대량 데이터를 가지고 있을 뿐 아니라 직접 데이터 모델링 및알고리즘을 만들어 튜닝 하는 데 필요한 기술을 얻고자 한다. 이러한 경우 대부분 한정된 물리적 장비와 MXNet 및 텐서플로(Tensorflow)와 같은 딥러닝 프로그래밍 프레임워크을 사용하게 된다.

딥러닝 연구에 필요한 확장성은 클라우드를 통해 해결할 수 있다. 신규 P2 인스턴스는 192GB의 GPU 메모리, 4만 개의 병렬 처리 코어, 70테라플롭스의 단정밀도 부동 소수점 성능 및 23테라플롭스의 배정밀도 부동 소수점 성능을 갖춘 최대 16개의 엔비디아 K80 GPU, 64개의 vCPU 및 732GiB의 호스트 메모리를 제공한다. 최대 16개의 GPU까지 GPU다이렉트 (피어 투 피어 GPU 통신) 기능을 제공하므로 여러 개의 GPU가 단일 호스트 내에서 함께 작동할 수 있다. 또한, 최대 20GB의 밴드위스 기반으로 스케일 아웃 방식으로 클러스터를 구성할 수 있다.

MXNet, 텐서플로, 토치(Torch), 카페(Caffe), 세라노(Theano) 및 쿠다(CUDA) 드라이버가 사전 설치되어있는 딥러닝 이미지(AMI)와 AWS 클라우드포메이션(CloudFormation)을 통해 슈퍼 컴퓨터급 딥러닝 클러스터를 원클릭으로 생성하고 삭제할 수 있다. 비용적인 측면에서도 효과적이다. 빠르게 성장하는 GPU 성능과 하드웨어 출시로 곧장 낡은 기종이 되어 버릴 뿐만 아니라 관리상으로도 애로가 많은 하이엔드 물리 장비를 직접 구매하지 않더라도, 클라우드를 통해 필요할 때마다(저렴한 스팟 인스턴스 등을 활용하여) 비용 효율적인 클러스터를 구성 및 운영할 수 있다.

GPU확장에 따라 선형 확장되는 MXNet의 처리량 및 속도 벤치마크 결과(출처: AWS블로그)
GPU확장에 따라 선형 확장되는 MXNet의 처리량 및 속도 벤치마크 결과(출처: AWS블로그)

특히, 아마존은 클라우드에 최적화된 GPU 확장성에 뛰어난 MXNet을 주력 엔진으로 선택했다. MXNet은 정교한 맞춤형 인공 지능 시스템을 구축 할 수 있도록 지원하는 오픈소스 딥러닝 엔진으로, 다양한 기능 및 확장성을 가지고 있다.

예를 들어, 인기있는 이미지 인식 네트워크인 Resnet의 경우, MXNet을 통해 다른 엔진에 비해 2 배 높은 처리량을 제공해, 50%의 시간 동안 동일한 모델을 트레이닝 할 수 있다. 벤치 마킹 결과에 따르면, MXNet은 다른 엔진과 달리 수백 개의 GPU에 대한 선형 확장성을 보여 주어 클라우드에 적합하다.

아마존에서는 MXNet 커뮤니티와 협력하고 있으며, 얼마전 아파치 재단의 인큐베이팅 프로젝트로 승인 받아 많은 개발자들의 참여를 독려하고 있는 중이다.

이번 글에서는 아마존에서의 인공 지능 활용 사례와 이를 통해 축적된 인공 지능 기술을 맞춤형 솔루션을 원하는 연구자와 데이터 과학자들에게 어떻게 제공하고 있는지 알아보았다. 다음에는 스마트 애플리케이션을 개발하려는 일반 개발자들에게 서버리스(Serverless) AI 서비스를 구현하는 방법을 소개하고자 한다.

원문 링크


          [ZDNet 칼럼] AWS re:Invent – 클라우드, 현실 세계에 스며들다        

지난 주 미국 라스베이거스에는 클라우드 컴퓨팅 마니아 3만 2천 여명이 한 자리에 모였다. AWS 리인벤트(re:Invent) 2016에 참가하기 위해 전 세계에서 모여든 AWS 고객, 개발자 및 파트너 업체 종사자들이다. 클라우드 생태계에 정점에 있는 이들이 모인 이유는 업계의 리더 위치에 있는 AWS의 미래 전략을 듣기 위해서였다. 행사를 주최한 AWS는 re:Invent를 교육 및 배움의 장소로 자리매김하고자 했고, 다양한 고객의 사례를 서로 나누는 장을 마련했다.

최근 몇 년간 클라우드 컴퓨팅은 어떻게 진화해 왔을까? 넷플릭스의 아키텍처였다가 최근에 AWS 클라우드 전략 담당 부사장으로 영입된 아드리안 코크로프트는 그의 블로그에서 다음과 같이 썼다.

“2014년에 엔터프라이즈 기업들이 AWS에 테스트 및 신규 애플리케이션을 개발하기 위한 방편으로 AWS를 사용하기 시작하여, 2015년에는 대량 마이그레이션을 하거나 전체 데이터센터를 퍼블릭 클라우드로 대체하는 패턴을 보았습니다. 올해 들어 이 변화는 미디어 산업, 판매 산업에서 빠르게 도입을 하는가 하면, 은행, 보험 등 퍼블릭 클라우드를 사용하는데 규제가 강한 금융 산업도 여기에 동참하였습니다. 다음은 아마 에너지, 교통, 정부, 제조 및 헬스케어 분야의 얼리 어댑터들이 클라우드 시장을 이끄는 주자가 될 것입니다.” (Cloud Trends ? Where have we come from and where are we headed 중)

앤디 제시 AWS 최고경영자(CEO)는 2014년에는 클라우드는 새로운 표준(Normal)으로서 엔터프라이즈 기업들이 활용을 시작했음을 알렸고, 2015년 기조 연설에서는 클라우드가 가져온 7가지 자유라는 주제로 클라우드가 스스로 원하는 길을 개척할 수 있는 자유를 준다는 점을 설명하였다. 그렇다면 올해의 화두는 무엇일까? 그는 이제 사용자들이 클라우드를 통해 그 동안 해내지 못했던 것에 대해 무엇이든 해 낼 수 있다는 느낌을 준다는 ‘슈퍼 파워(Super Power)’ 도구로서 자리매김 했다는 점을 기조 연설을 통해 강조했다.

​

■클라우드가 없는 곳에도 클라우드를 넣다
아드리안의 언급대로, 앞으로 클라우드 활용이 익숙해진 IT 서비스 분야에서 제조, 유통, 판매 분야 등으로 확대될 전망이다. 이러한 실제 세계는 사실상 네트워크 연결이 제한적이거나 아예 존재하지 않는 극단적인 상황도 있다. 예를 들어, 옥수수 농장, 항공기 제조 공장, 병원등 산업 현장은 우리 생각과 많이 다르다.

일단 현장에서 생산되는 데이터들은 바로 클라우드로 옮기기도 애매하고, 일시적으로 현장에서 IT 환경을 꾸려야 한다. 이를 위해서는 로컬 환경에 클라우드와 궁합이 맞는 컴퓨팅 및 스토리지 서비스가 필요하다. Amazon Greengrass는 AWS Lambda와 AWS IoT를 결합한 로컬 디바이스로 주변 환경의 데이터 수집이나 프로그램 처리 등을 손쉽게 해준다. AWS Lambda는 서버리스 컴퓨팅 환경을 연 클라우드 함수 서비스로 임베디드 형식으로 로컬 컴퓨팅에 이식하게 됨에 따라 그 사용 범위가 훨씬 넓어질 전망이다.

Greengrass에서 사용된 임베디드 하이브리드 컴퓨팅 방식은 대량 로컬 데이터를 손쉽게 클라우드로 이전할 수 있는 데이터 운송 장치인 AWS Snowball에도 적용됐다. 새로 나온 AWS Snowball Edge는 용량을 100TB로 늘렸을 뿐만 아니라, 현장에서 사용하는 다양한 네트워크 및 데이터 어댑터를 지원한다. 특히, Greengrass에서 적용했던 것과 같이 AWS Lambda 함수를 기기 내부에 탑재하여, 손쉽게 Amazon Simple Storage Service(S3) 스토리지 버킷과 작업이 가능하다.

이번 기조 연설에서 무엇 보다 깜짝 발표는 기존의 소형 Snowball 기기뿐만 아니라, 엑사바이트(Exabite) 급 데이터를 대량으로 한번에 옮길 수 있는 AWS Snowmobile이라는 데이터 선적용 콘테이너 트럭을 선보였던 순간이었다. 인터넷으로 옮기려면 수백 년이 걸릴 데이터를 한꺼번에 몇 주안에 클라우드로 옮길 수 있다는 건, 이제 현실의 데이터 제약을 완전히 극복했다는 것을 입증했다.

AWS Snowmobile 데이터 이동 콘테이너 트럭(출처: @AWSCloud)
AWS Snowmobile 데이터 이동 콘테이너 트럭(출처: @AWSCloud)

​
고객 발표사로서 이탈리아 기반 글로벌 전력회사인 Enel 이 올라온 것도 우연은 아니다. Enel은 국내에서는 잘 알려져 있지 않지만, 100 년 이상의 역사를 가지고 있으며, 6천100 만 고객 190만km의 배급망을 가진 전력 업계의 리더이며, Fortune 지의 ‘세상을 바꿀 기업’ 50 개사 중 5 위에 선정된 기업이다. 스마트 전력 이용률에서는 25%, 전 세계 발전량 3위, 신 재생 에너지, 송배전 망 거리 판매액 시가 총액도 1위를 차지했다.

그러나 리먼 브라더스 사태 이후 전력 수요와 GDP 및 석유 소비량과 결합이 없어지면서, 석유 가격 감소 및 청정 에너지에 대한 투자 증대 등 전력 산업도 변화하고 있다.

연사로 나선 인프라 담당 임원인 파비오 베로네세는 “이러한 전력 산업 변화 과정에서 Enel 역시 클라우드 우선 전략으로 변화를 꾀하게 되었으며, 2016 년 6 월까지 10,000개의 가상 서버, 30,000 CPU와 6PB 스토리지 마이그레이션을 완료하고 AWS를 활용하여 서버 프로비저닝을 3~4 주 걸리던 것을 이틀 내로 처리할 수 있었다”라고 전했다.

이를 통해 컴퓨팅 자원의 소비는 21% 감소, 스토리지는 60%까지 절감하였다. Enel의 다음 목표는 사내 IT 업무 100%를 클라우드로 전환하고, 오프라인 스마트 전력망 운영 체계에 있어클라우드 기반 IoT 활용을 목표로 하고 있다고 언급했다.

■클라우드 기반 인공 지능으로 현실 세계와 접하다
이번 기조 연설의 메인 테마 중 하나는 단연 인공 지능(AI)이다. 이미 AWS는 머신 러닝 서비스를 1년 전에 공개한 바 있고, 최근에 GPU 기반 P2 인스턴스와 딥러닝 전용 AMIë