Informatika | Mesterséges intelligencia » Klein-Abbeel - Artificial Intelligence, CS 188

Alapadatok

Év, oldalszám:2018, 1058 oldal

Nyelv:angol

Letöltések száma:11

Feltöltve:2013. április 21.

Méret:46 MB

Intézmény:
[UC] Kaliforniai Egyetem

Megjegyzés:

Csatolmány:-

Letöltés PDF-ben:Kérlek jelentkezz be!



Értékelések

Nincs még értékelés. Legyél Te az első!

Tartalmi kivonat

Source: http://www.doksinet CS 188: Artificial Intelligence Introduction Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All materials available at http://aiberkeleyedu] Source: http://www.doksinet Course Staff GSIs Professors Dan Klein John Du James Ferguson Sergey Karayev Michael Liang Teodor Moldovan Evan Shelhamer Alvin Wong Ning Zhang Pieter Abbeel Source: http://www.doksinet Course Information ▪ Communication: Sign up at: inst.eecsberkeleyedu/~cs188 ▪ Announcements on webpage ▪ Questions? Discussion on piazza ▪ Staff email: cs188-staff@lists ▪ This course is webcast (Sp14 live videos) + Fa12 edited videos (1-11) + Fa13 live videos ▪ Course technology: ▪ New infrastructure ▪ Autograded projects, interactive homeworks (unlimited submissions!) + regular homework ▪ Help us make it awesome! Source: http://www.doksinet

Course Information ▪ Prerequisites: ▪ (CS 61A or B) and (Math 55 or CS 70) ▪ Strongly recommended: CS61A, CS61B and CS70 ▪ There will be a lot of math (and programming) ▪ Work and Grading: ▪ 5 programming projects: Python, groups of 1 or 2 ▪ 5 late days for semester, maximum 2 per project ▪ ~9 homework assignments: ▪ Part 1: interactive, solve together, submit alone ▪ Part 2: written, solve together, write up alone, electronic submission through pandagrader [these problems will be questions from past exams] ▪ ▪ ▪ ▪ Two midterms, one final Participation can help on margins Fixed scale Academic integrity policy ▪ Contests! Source: http://www.doksinet Textbook ▪ Not required, but for students who want to read more we recommend ▪ Russell & Norvig, AI: A Modern Approach, 3rd Ed. ▪ Warning: Not a course textbook, so our presentation does not necessarily follow the presentation in the book. Source: http://www.doksinet Important This Week •

Important this week: • Register for the class on edx • Register for the class on piazza --- our main resource for discussion and communication • P0: Python tutorial is out (due on Friday 1/24 at 5pm) • One-time (optional) P0 lab hours this week • Wed 2-3pm, Thu 4-5pm --- all in 330 Soda • Get (optional) account forms in front after class • Math self-diagnostic up on web page --- important to check your preparedness for second half • Also important: • Sections start next week. You are free to attend any section, priority in section you signed up for if among first 35 to sign up. Sign-up first come first served on Friday at 2pm on piazza poll • If you are wait-listed, you might or might not get in depending on how many students drop. Contact Michael-David Sasson (msasson@cs.berkeleyedu) with any questions on the process • Office Hours start next week, this week there are the P0 labs and you can catch the professors after lecture Source: http://www.doksinet

Today ▪ What is artificial intelligence? ▪ What can AI do? ▪ What is this course? Source: http://www.doksinet Sci-Fi AI? Source: http://www.doksinet What is AI? The science of making machines that: Think like people Think rationally Act like people Act rationally Source: http://www.doksinet Rational Decisions We’ll use the term rational in a very specific, technical way: ▪ Rational: maximally achieving pre-defined goals ▪ Rationality only concerns what decisions are made (not the thought process behind them) ▪ Goals are expressed in terms of the utility of outcomes ▪ Being rational means maximizing your expected utility A better title for this course would be: Computational Rationality Source: http://www.doksinet Maximize Your Expected Utility Source: http://www.doksinet What About the Brain? ▪ Brains (human minds) are very good at making rational decisions, but not perfect ▪ Brains aren’t as modular as software, so hard to reverse engineer!

▪ “Brains are to intelligence as wings are to flight” ▪ Lessons learned from the brain: memory and simulation are key to decision making Source: http://www.doksinet A (Short) History of AI Demo: HISTORY – MT1950.wmv Source: http://www.doksinet A (Short) History of AI ▪ 1940-1950: Early days ▪ 1943: McCulloch & Pitts: Boolean circuit model of brain ▪ 1950: Turings “Computing Machinery and Intelligence” ▪ 195070: Excitement: Look, Ma, no hands! ▪ 1950s: Early AI programs, including Samuels checkers program, Newell & Simons Logic Theorist, Gelernters Geometry Engine ▪ 1956: Dartmouth meeting: “Artificial Intelligence” adopted ▪ 1965: Robinsons complete algorithm for logical reasoning ▪ 197090: Knowledge-based approaches ▪ 196979: Early development of knowledge-based systems ▪ 198088: Expert systems industry booms ▪ 198893: Expert systems industry busts: “AI Winter” ▪ 1990: Statistical approaches ▪ Resurgence of probability,

focus on uncertainty ▪ General increase in technical depth ▪ Agents and learning systems “AI Spring”? ▪ 2000: Where are we now? Source: http://www.doksinet What Can AI Do? Quiz: Which of the following can be done at present? ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Play a decent game of table tennis? Play a decent game of Jeopardy? Drive safely along a curving mountain road? Drive safely along Telegraph Avenue? Buy a weeks worth of groceries on the web? Buy a weeks worth of groceries at Berkeley Bowl? Discover and prove a new mathematical theorem? Converse successfully with another person for an hour? Perform a surgical operation? Put away the dishes and fold the laundry? Translate spoken Chinese into spoken English in real time? Write an intentionally funny story? Source: http://www.doksinet Unintentionally Funny Stories ▪ One day Joe Bear was hungry. He asked his friend Irving Bird where some honey was. Irving told him there was a beehive in the oak tree.

Joe walked to the oak tree. He ate the beehive The End ▪ Henry Squirrel was thirsty. He walked over to the river bank where his good friend Bill Bird was sitting. Henry slipped and fell in the river. Gravity drowned The End. ▪ Once upon a time there was a dishonest fox and a vain crow. One day the crow was sitting in his tree, holding a piece of cheese in his mouth. He noticed that he was holding the piece of cheese. He became hungry, and swallowed the cheese. The fox walked over to the crow The End [Shank, Tale-Spin System, 1984] Source: http://www.doksinet Natural Language ▪ Speech technologies (e.g Siri) ▪ Automatic speech recognition (ASR) ▪ Text-to-speech synthesis (TTS) ▪ Dialog systems Demo: NLP – ASR tvsample.avi Source: http://www.doksinet Natural Language ▪ Speech technologies (e.g Siri) ▪ Automatic speech recognition (ASR) ▪ Text-to-speech synthesis (TTS) ▪ Dialog systems ▪ Language processing technologies ▪ Question answering ▪ Machine

translation ▪ Web search ▪ Text classification, spam filtering, etc Source: http://www.doksinet Vision (Perception) ▪ Object and face recognition ▪ Scene segmentation ▪ Image classification Demo1: VISION – lec 1 t2 video.flv Images from Erik Sudderth (left), wikipedia (right) Demo2: VISION – lec 1 obj rec 0.mpg Source: http://www.doksinet Robotics Demo 1: ROBOTICS – soccer.avi Demo 2: ROBOTICS – soccer2.avi Demo 3: ROBOTICS – gcar.avi ▪ Robotics ▪ Part mech. eng ▪ Part AI ▪ Reality much harder than simulations! ▪ Technologies ▪ ▪ ▪ ▪ Vehicles Rescue Soccer! Lots of automation ▪ In this class: ▪ We ignore mechanical aspects ▪ Methods for planning ▪ Methods for control Images from UC Berkeley, Boston Dynamics, RoboCup, Google Demo 4: ROBOTICS – laundry.avi Demo 5: ROBOTICS – petman.avi Source: http://www.doksinet Logic ▪ Logical systems ▪ Theorem provers ▪ NASA fault diagnosis ▪ Question answering ▪ Methods:

▪ Deduction systems ▪ Constraint satisfaction ▪ Satisfiability solvers (huge advances!) Image from Bart Selman Source: http://www.doksinet Game Playing ▪ Classic Moment: May, 97: Deep Blue vs. Kasparov ▪ ▪ ▪ ▪ ▪ First match won against world champion “Intelligent creative” play 200 million board positions per second Humans understood 99.9 of Deep Blues moves Can do about the same now with a PC cluster ▪ Open question: ▪ How does human cognition deal with the search space explosion of chess? ▪ Or: how can humans compete with computers at all?? ▪ 1996: Kasparov Beats Deep Blue “I could feel --- I could smell --- a new kind of intelligence across the table.” ▪ 1997: Deep Blue Beats Kasparov “Deep Blue hasnt proven anything.” ▪ Huge game-playing advances recently, e.g in Go! Text from Bart Selman, image from IBM’s Deep Blue pages Source: http://www.doksinet Decision Making ▪ Applied AI involves many kinds of automation ▪

Scheduling, e.g airline routing, military ▪ Route planning, e.g Google maps ▪ Medical diagnosis ▪ Web search engines ▪ Spam classifiers ▪ Automated help desks ▪ Fraud detection ▪ Product recommendations ▪ Lots more! Source: http://www.doksinet An agent is an entity that perceives and acts. ▪ A rational agent selects actions that maximize its (expected) utility. ▪ Characteristics of the percepts, environment, and action space dictate techniques for selecting rational actions ▪ This course is about: ▪ General AI techniques for a variety of problem types ▪ Learning to recognize when and how a new problem can be solved with an existing technique Sensors Percepts ? Actuators Actions Environment ▪ Agent Designing Rational Agents Source: http://www.doksinet Pac-Man as an Agent Agent Sensors Environment Percepts ? Actuators Actions Pac-Man is a registered trademark of Namco-Bandai Games, used here for educational purposes Demo1:

pacman-l1.mp4 or L1D2 Source: http://www.doksinet Course Topics ▪ Part I: Making Decisions ▪ Fast search / planning ▪ Constraint satisfaction ▪ Adversarial and uncertain search ▪ Part II: Reasoning under Uncertainty ▪ Bayes’ nets ▪ Decision theory ▪ Machine learning ▪ Throughout: Applications ▪ Natural language, vision, robotics, games, Source: http://www.doksinet CS 188: Artificial Intelligence Search Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Today ▪ Agents that Plan Ahead ▪ Search Problems ▪ Uninformed Search Methods ▪ Depth-First Search ▪ Breadth-First Search ▪ Uniform-Cost Search Source: http://www.doksinet Agents that Plan Source: http://www.doksinet Reflex Agents ▪ Reflex agents: ▪ Choose action based on

current percept (and maybe memory) ▪ May have memory or a model of the world’s current state ▪ Do not consider the future consequences of their actions ▪ Consider how the world IS ▪ Can a reflex agent be rational? [Demo: reflex optimal (L2D1)] [Demo: reflex optimal (L2D2)] Source: http://www.doksinet Video of Demo Reflex Optimal Source: http://www.doksinet Video of Demo Reflex Odd Source: http://www.doksinet Planning Agents ▪ Planning agents: ▪ Ask “what if” ▪ Decisions based on (hypothesized) consequences of actions ▪ Must have a model of how the world evolves in response to actions ▪ Must formulate a goal (test) ▪ Consider how the world WOULD BE ▪ Optimal vs. complete planning ▪ Planning vs. replanning [Demo: replanning (L2D3)] [Demo: mastermind (L2D4)] Source: http://www.doksinet Video of Demo Replanning Source: http://www.doksinet Video of Demo Mastermind Source: http://www.doksinet Search Problems Source: http://www.doksinet

Search Problems ▪ A search problem consists of: ▪ A state space ▪ A successor function (with actions, costs) “N”, 1.0 “E”, 1.0 ▪ A start state and a goal test ▪ A solution is a sequence of actions (a plan) which transforms the start state to a goal state Source: http://www.doksinet Search Problems Are Models Source: http://www.doksinet Example: Traveling in Romania ▪ State space: ▪ Cities ▪ Successor function: ▪ Roads: Go to adjacent city with cost = distance ▪ Start state: ▪ Arad ▪ Goal test: ▪ Is state == Bucharest? ▪ Solution? Source: http://www.doksinet What’s in a State Space? The world state includes every last detail of the environment A search state keeps only the details needed for planning (abstraction) ▪ Problem: Pathing ▪ States: (x,y) location ▪ Actions: NSEW ▪ Successor: update location only ▪ Goal test: is (x,y)=END ▪ Problem: Eat-All-Dots ▪ States: {(x,y), dot booleans} ▪ Actions: NSEW ▪

Successor: update location and possibly a dot boolean ▪ Goal test: dots all false Source: http://www.doksinet State Space Sizes? ▪ World state: ▪ ▪ ▪ ▪ Agent positions: 120 Food count: 30 Ghost positions: 12 Agent facing: NSEW ▪ How many ▪ World states? 120x(230)x(122)x4 ▪ States for pathing? 120 ▪ States for eat-all-dots? 120x(230) Source: http://www.doksinet Quiz: Safe Passage ▪ Problem: eat all dots while keeping the ghosts perma-scared ▪ What does the state space have to specify? ▪ (agent position, dot booleans, power pellet booleans, remaining scared time) Source: http://www.doksinet State Space Graphs and Search Trees Source: http://www.doksinet State Space Graphs ▪ State space graph: A mathematical representation of a search problem ▪ Nodes are (abstracted) world configurations ▪ Arcs represent successors (action results) ▪ The goal test is a set of goal nodes (maybe only one) ▪ In a state space graph, each state occurs only

once! ▪ We can rarely build this full graph in memory (it’s too big), but it’s a useful idea Source: http://www.doksinet State Space Graphs ▪ State space graph: A mathematical representation of a search problem G a c b ▪ Nodes are (abstracted) world configurations ▪ Arcs represent successors (action results) ▪ The goal test is a set of goal nodes (maybe only one) e d f S ▪ In a search graph, each state occurs only once! ▪ We can rarely build this full graph in memory (it’s too big), but it’s a useful idea h p q Tiny search graph for a tiny search problem r Source: http://www.doksinet Search Trees This is now / start “N”, 1.0 “E”, 1.0 Possible futures ▪ A search tree: ▪ ▪ ▪ ▪ ▪ A “what if” tree of plans and their outcomes The start state is the root node Children correspond to successors Nodes show states, but correspond to PLANS that achieve those states For most problems, we can never actually build the whole tree

Source: http://www.doksinet State Space Graphs vs. Search Trees State Space Graph G a Each NODE in in the search tree is an entire PATH in the state space graph. c b S e d b c a a e p h q r e d f S h p Search Tree q r We construct both on demand – and we construct as little as possible. h p q q c a r p f q G q f c a G Source: http://www.doksinet Quiz: State Space Graphs vs. Search Trees Consider this 4-state graph: How big is its search tree (from S)? a G S b Important: Lots of repeated structure in the search tree! Source: http://www.doksinet Tree Search Source: http://www.doksinet Search Example: Romania Source: http://www.doksinet Searching with a Search Tree ▪ Search: ▪ Expand out potential plans (tree nodes) ▪ Maintain a fringe of partial plans under consideration ▪ Try to expand as few tree nodes as possible Source: http://www.doksinet General Tree Search ▪ Important ideas: ▪ Fringe ▪ Expansion ▪

Exploration strategy ▪ Main question: which fringe nodes to explore? Source: http://www.doksinet Example: Tree Search G a c b e d f S h p q r Source: http://www.doksinet Depth-First Search Source: http://www.doksinet Depth-First Search Strategy: expand a deepest node first G a c b Implementation: Fringe is a LIFO stack e d f S h p r q S e d b c a a e h p q q c a h r p f q G p q r q f c a G Source: http://www.doksinet Search Algorithm Properties Source: http://www.doksinet Search Algorithm Properties ▪ ▪ ▪ ▪ Complete: Guaranteed to find a solution if one exists? Optimal: Guaranteed to find the least cost path? Time complexity? Space complexity? 1 node b nodes b2 nodes ▪ Cartoon of search tree: ▪ b is the branching factor ▪ m is the maximum depth ▪ solutions at various depths b m tiers bm nodes ▪ Number of nodes in entire tree? ▪ 1 + b + b2 + . bm = O(bm) Source: http://www.doksinet Depth-First Search

(DFS) Properties ▪ What nodes DFS expand? ▪ Some left prefix of the tree. ▪ Could process the whole tree! ▪ If m is finite, takes time O(bm) ▪ How much space does the fringe take? b 1 node b nodes b2 nodes m tiers ▪ Only has siblings on path to root, so O(bm) ▪ Is it complete? ▪ m could be infinite, so only if we prevent cycles (more later) ▪ Is it optimal? ▪ No, it finds the “leftmost” solution, regardless of depth or cost bm nodes Source: http://www.doksinet Breadth-First Search Source: http://www.doksinet Breadth-First Search Strategy: expand a shallowest node first G a c b Implementation: Fringe is a FIFO queue e d S f h p r q S e d Search Tiers b c a a e h p q q c a h r p f q G p q r q f c a G Source: http://www.doksinet Breadth-First Search (BFS) Properties ▪ What nodes does BFS expand? ▪ Processes all nodes above shallowest solution ▪ Let depth of shallowest solution be s s tiers ▪ Search takes time

O(bs) ▪ How much space does the fringe take? b 1 node b nodes b2 nodes bs nodes ▪ Has roughly the last tier, so O(bs) ▪ Is it complete? ▪ s must be finite if a solution exists, so yes! ▪ Is it optimal? ▪ Only if costs are all 1 (more on costs later) bm nodes Source: http://www.doksinet Quiz: DFS vs BFS Source: http://www.doksinet Quiz: DFS vs BFS ▪ When will BFS outperform DFS? ▪ When will DFS outperform BFS? [Demo: dfs/bfs maze water (L2D6)] Source: http://www.doksinet Video of Demo Maze Water DFS/BFS (part 1) Source: http://www.doksinet Video of Demo Maze Water DFS/BFS (part 2) Source: http://www.doksinet Iterative Deepening ▪ Idea: get DFS’s space advantage with BFS’s time / shallow-solution advantages ▪ Run a DFS with depth limit 1. If no solution ▪ Run a DFS with depth limit 2. If no solution ▪ Run a DFS with depth limit 3. ▪ Isn’t that wastefully redundant? ▪ Generally most work happens in the lowest level searched, so

not so bad! b Source: http://www.doksinet Cost-Sensitive Search GOAL a 2 2 c b 1 3 2 8 2 e d 3 9 8 START p 15 2 h 4 1 f 4 q 2 r BFS finds the shortest path in terms of number of actions. It does not find the least-cost path. We will now cover a similar algorithm which does find the least-cost path. Source: http://www.doksinet Uniform Cost Search Source: http://www.doksinet Uniform Cost Search 2 Strategy: expand a cheapest node first: b d S 1 c 8 1 3 Fringe is a priority queue (priority: cumulative cost) G a 2 9 p 15 2 e 8 h f 2 1 r q S 0 Cost contours b 4 c a 6 a h 17 r 11 e 5 11 p 9 e 3 d h 13 r 7 p f 8 q q q 11 c a G 10 q f c a G p 1 q 16 Source: http://www.doksinet Uniform Cost Search (UCS) Properties ▪ What nodes does UCS expand? ▪ Processes all nodes with cost less than cheapest solution! ▪ If that solution costs C* and arcs cost at least  , then the “effective depth” is roughly

C*/ C*/ “tiers” C*/  ▪ Takes time O(b ) (exponential in effective depth) ▪ How much space does the fringe take? ▪ Has roughly the last tier, so O(bC*/) ▪ Is it complete? ▪ Assuming best solution has a finite cost and minimum arc cost is positive, yes! ▪ Is it optimal? ▪ Yes! (Proof next lecture via A*) b c1 c2 c3 Source: http://www.doksinet Uniform Cost Issues ▪ Remember: UCS explores increasing cost contours c1 c2 c3 ▪ The good: UCS is complete and optimal! ▪ The bad: ▪ Explores options in every “direction” ▪ No information about goal location ▪ We’ll fix that soon! Start Goal [Demo: empty grid UCS (L2D5)] [Demo: maze with deep/shallow water DFS/BFS/UCS (L2D7)] Source: http://www.doksinet Video of Demo Empty UCS Source: http://www.doksinet Video of Demo Maze with Deep/Shallow Water --- DFS, BFS, or UCS? (part 1) Source: http://www.doksinet Video of Demo Maze with Deep/Shallow Water --- DFS,

BFS, or UCS? (part 2) Source: http://www.doksinet Video of Demo Maze with Deep/Shallow Water --- DFS, BFS, or UCS? (part 3) Source: http://www.doksinet The One Queue ▪ All these search algorithms are the same except for fringe strategies ▪ Conceptually, all fringes are priority queues (i.e collections of nodes with attached priorities) ▪ Practically, for DFS and BFS, you can avoid the log(n) overhead from an actual priority queue, by using stacks and queues ▪ Can even code one implementation that takes a variable queuing object Source: http://www.doksinet Search and Models ▪ Search operates over models of the world ▪ The agent doesn’t actually try all the plans out in the real world! ▪ Planning is all “in simulation” ▪ Your search is only as good as your models Source: http://www.doksinet Search Gone Wrong? Source: http://www.doksinet CS 188: Artificial Intelligence Informed Search Instructors: Dan Klein and Pieter Abbeel University of California,

Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Today ▪ Informed Search ▪ Heuristics ▪ Greedy Search ▪ A* Search ▪ Graph Search Source: http://www.doksinet Recap: Search Source: http://www.doksinet Recap: Search ▪ Search problem: ▪ ▪ ▪ ▪ States (configurations of the world) Actions and costs Successor function (world dynamics) Start state and goal test ▪ Search tree: ▪ Nodes: represent plans for reaching states ▪ Plans have costs (sum of action costs) ▪ Search algorithm: ▪ Systematically builds a search tree ▪ Chooses an ordering of the fringe (unexplored nodes) ▪ Optimal: finds least-cost plans Source: http://www.doksinet Example: Pancake Problem Cost: Number of pancakes flipped Source: http://www.doksinet Example: Pancake Problem Source: http://www.doksinet Example: Pancake Problem

State space graph with costs as weights 4 2 2 3 3 4 3 4 3 2 2 2 4 3 Source: http://www.doksinet General Tree Search Action: flip top two Cost: 2 Action: fliptoallreach four goal: Path Cost: 4 flip three Flip four, Total cost: 7 Source: http://www.doksinet The One Queue ▪ All these search algorithms are the same except for fringe strategies ▪ Conceptually, all fringes are priority queues (i.e collections of nodes with attached priorities) ▪ Practically, for DFS and BFS, you can avoid the log(n) overhead from an actual priority queue, by using stacks and queues ▪ Can even code one implementation that takes a variable queuing object Source: http://www.doksinet Uninformed Search Source: http://www.doksinet Uniform Cost Search ▪ Strategy: expand lowest path cost c1 c2 c3 ▪ The good: UCS is complete and optimal! ▪ The bad: ▪ Explores options in every “direction” ▪ No information about goal location Start Goal [Demo: contours UCS empty

(L3D1)] [Demo: contours UCS pacman small maze (L3D3)] Source: http://www.doksinet Video of Demo Contours UCS Empty Source: http://www.doksinet Video of Demo Contours UCS Pacman Small Maze Source: http://www.doksinet Informed Search Source: http://www.doksinet Search Heuristics ▪ A heuristic is: ▪ A function that estimates how close a state is to a goal ▪ Designed for a particular search problem ▪ Examples: Manhattan distance, Euclidean distance for pathing 10 5 11.2 Source: http://www.doksinet Example: Heuristic Function h(x) Source: http://www.doksinet Example: Heuristic Function Heuristic: the number of the largest pancake that is still out of place 3 h(x) 4 3 4 3 0 4 4 3 4 4 2 3 Source: http://www.doksinet Greedy Search Source: http://www.doksinet Example: Heuristic Function h(x) Source: http://www.doksinet Greedy Search ▪ Expand the node that seems closest ▪ What can go wrong? Source: http://www.doksinet Greedy Search ▪

Strategy: expand a node that you think is closest to a goal state b ▪ Heuristic: estimate of distance to nearest goal for each state ▪ A common case: ▪ Best-first takes you straight to the (wrong) goal b ▪ Worst-case: like a badly-guided DFS [Demo: contours greedy empty (L3D1)] [Demo: contours greedy pacman small maze (L3D4)] Source: http://www.doksinet Video of Demo Contours Greedy (Empty) Source: http://www.doksinet Video of Demo Contours Greedy (Pacman Small Maze) Source: http://www.doksinet A* Search Source: http://www.doksinet A* Search UCS Greedy A* Source: http://www.doksinet Combining UCS and Greedy ▪ Uniform-cost orders by path cost, or backward cost g(n) ▪ Greedy orders by goal proximity, or forward cost h(n) 8 S g=1 h=5 h=1 e 1 S h=6 c h=7 1 a h=5 1 1 3 b h=6 2 d h=2 G g=2 h=6 g=3 h=7 a b d g=4 h=2 e g=9 h=1 c G g=6 h=0 d g = 10 h=2 G g = 12 h=0 h=0 ▪ A* Search orders by the sum: f(n) = g(n) + h(n) g=0 h=6

Example: Teg Grenager Source: http://www.doksinet When should A* terminate? ▪ Should we stop when we enqueue a goal? h=2 2 S A 2 h=3 h=0 2 B G 3 h=1 ▪ No: only stop when we dequeue a goal Source: http://www.doksinet Is A* Optimal? h=6 1 S A h=7 3 G 5 ▪ What went wrong? ▪ Actual bad goal cost < estimated good goal cost ▪ We need estimates to be less than actual costs! h=0 Source: http://www.doksinet Admissible Heuristics Source: http://www.doksinet Idea: Admissibility Inadmissible (pessimistic) heuristics break optimality by trapping good plans on the fringe Admissible (optimistic) heuristics slow down bad plans but never outweigh true costs Source: http://www.doksinet Admissible Heuristics ▪ A heuristic h is admissible (optimistic) if: where is the true cost to a nearest goal ▪ Examples: 4 15 ▪ Coming up with admissible heuristics is most of what’s involved in using A* in practice. Source: http://www.doksinet Optimality

of A* Tree Search Source: http://www.doksinet Optimality of A* Tree Search Assume: ▪ A is an optimal goal node ▪ B is a suboptimal goal node ▪ h is admissible Claim: ▪ A will exit the fringe before B Source: http://www.doksinet Optimality of A* Tree Search: Blocking Proof: ▪ Imagine B is on the fringe ▪ Some ancestor n of A is on the fringe, too (maybe A!) ▪ Claim: n will be expanded before B 1. f(n) is less or equal to f(A) Definition of f-cost Admissibility of h h = 0 at a goal Source: http://www.doksinet Optimality of A* Tree Search: Blocking Proof: ▪ Imagine B is on the fringe ▪ Some ancestor n of A is on the fringe, too (maybe A!) ▪ Claim: n will be expanded before B 1. f(n) is less or equal to f(A) 2. f(A) is less than f(B) B is suboptimal h = 0 at a goal Source: http://www.doksinet Optimality of A* Tree Search: Blocking Proof: ▪ Imagine B is on the fringe ▪ Some ancestor n of A is on the fringe, too (maybe A!) ▪ Claim: n will be

expanded before B 1. f(n) is less or equal to f(A) 2. f(A) is less than f(B) 3. n expands before B ▪ All ancestors of A expand before B ▪ A expands before B ▪ A* search is optimal Source: http://www.doksinet Properties of A* Source: http://www.doksinet Properties of A* Uniform-Cost b A* b Source: http://www.doksinet UCS vs A* Contours ▪ Uniform-cost expands equally in all “directions” Start ▪ A* expands mainly toward the goal, but does hedge its bets to ensure optimality Start Goal Goal [Demo: contours UCS / greedy / A* empty (L3D1)] [Demo: contours A* pacman small maze (L3D5)] Source: http://www.doksinet Video of Demo Contours (Empty) -- UCS Source: http://www.doksinet Video of Demo Contours (Empty) -- Greedy Source: http://www.doksinet Video of Demo Contours (Empty) – A* Source: http://www.doksinet Video of Demo Contours (Pacman Small Maze) – A* Source: http://www.doksinet Comparison Greedy Uniform Cost A* Source:

http://www.doksinet A* Applications Source: http://www.doksinet A* Applications ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Video games Pathing / routing problems Resource planning problems Robot motion planning Language analysis Machine translation Speech recognition [Demo: UCS / A* pacman tiny maze (L3D6,L3D7)] [Demo: guess algorithm Empty Shallow/Deep (L3D8)] Source: http://www.doksinet Video of Demo Pacman (Tiny Maze) – UCS / A* Source: http://www.doksinet Video of Demo Empty Water Shallow/Deep – Guess Algorithm Source: http://www.doksinet Creating Heuristics Source: http://www.doksinet Creating Admissible Heuristics ▪ Most of the work in solving hard search problems optimally is in coming up with admissible heuristics ▪ Often, admissible heuristics are solutions to relaxed problems, where new actions are available 366 15 ▪ Inadmissible heuristics are often useful too Source: http://www.doksinet Example: 8 Puzzle Start State ▪ ▪ ▪ ▪ ▪ Actions What

are the states? How many states? What are the actions? How many successors from the start state? What should the costs be? Goal State Source: http://www.doksinet 8 Puzzle I ▪ ▪ ▪ ▪ Heuristic: Number of tiles misplaced Why is it admissible? h(start) = 8 This is a relaxed-problem heuristic Start State Goal State Average nodes expanded when the optimal path has 4 steps 8 steps 12 steps UCS TILES 112 13 6,300 39 3.6 x 106 227 Statistics from Andrew Moore Source: http://www.doksinet 8 Puzzle II ▪ What if we had an easier 8-puzzle where any tile could slide any direction at any time, ignoring other tiles? ▪ Total Manhattan distance Start State Goal State ▪ Why is it admissible? Average nodes expanded when the optimal path has ▪ h(start) = 3 + 1 + 2 + = 18 4 steps 8 steps 12 steps TILES 13 39 227 MANHATTAN 12 25 73 Source: http://www.doksinet 8 Puzzle III ▪ How about using the actual cost as a heuristic? ▪ Would it be admissible? ▪ Would we

save on nodes expanded? ▪ What’s wrong with it? ▪ With A*: a trade-off between quality of estimate and work per node ▪ As heuristics get closer to the true cost, you will expand fewer nodes but usually do more work per node to compute the heuristic itself Source: http://www.doksinet Semi-Lattice of Heuristics Source: http://www.doksinet Trivial Heuristics, Dominance ▪ Dominance: ha ≥ hc if ▪ Heuristics form a semi-lattice: ▪ Max of admissible heuristics is admissible ▪ Trivial heuristics ▪ Bottom of lattice is the zero heuristic (what does this give us?) ▪ Top of lattice is the exact heuristic Source: http://www.doksinet Graph Search Source: http://www.doksinet Tree Search: Extra Work! ▪ Failure to detect repeated states can cause exponentially more work. State Graph Search Tree Source: http://www.doksinet Graph Search ▪ In BFS, for example, we shouldn’t bother expanding the circled nodes (why?) S e d b c a a e h p q q c a h r p

f q G p q r q f c a G Source: http://www.doksinet Graph Search ▪ Idea: never expand a state twice ▪ How to implement: ▪ Tree search + set of expanded states (“closed set”) ▪ Expand the search tree node-by-node, but ▪ Before expanding a node, check to make sure its state has never been expanded before ▪ If not new, skip it, if new add to closed set ▪ Important: store the closed set as a set, not a list ▪ Can graph search wreck completeness? Why/why not? ▪ How about optimality? Source: http://www.doksinet A* Graph Search Gone Wrong? State space graph Search tree A S (0+2) 1 1 h=4 S h=1 h=2 C 1 2 3 B h=1 G h=0 A (1+4) B (1+1) C (2+1) C (3+1) G (5+0) G (6+0) Source: http://www.doksinet Consistency of Heuristics ▪ Main idea: estimated heuristic costs ≤ actual costs ▪ Admissibility: heuristic cost ≤ actual cost to goal A 1 h=4 h=2 h(A) ≤ actual cost from A to G C h=1 ▪ Consistency: heuristic “arc” cost ≤ actual

cost for each arc h(A) – h(C) ≤ cost(A to C) 3 ▪ Consequences of consistency: ▪ The f value along a path never decreases G h(A) ≤ cost(A to C) + h(C) ▪ A* graph search is optimal Source: http://www.doksinet Optimality of A* Graph Search Source: http://www.doksinet Optimality of A* Graph Search ▪ Sketch: consider what A* does with a consistent heuristic: ▪ Fact 1: In tree search, A* expands nodes in increasing total f value (f-contours) ▪ Fact 2: For every state s, nodes that reach s optimally are expanded before nodes that reach s suboptimally ▪ Result: A* graph search is optimal f1 f2 f3 Source: http://www.doksinet Optimality ▪ Tree search: ▪ A* is optimal if heuristic is admissible ▪ UCS is a special case (h = 0) ▪ Graph search: ▪ A* optimal if heuristic is consistent ▪ UCS optimal (h = 0 is consistent) ▪ Consistency implies admissibility ▪ In general, most natural admissible heuristics tend to be consistent, especially if

from relaxed problems Source: http://www.doksinet A*: Summary Source: http://www.doksinet A*: Summary ▪ A* uses both backward costs and (estimates of) forward costs ▪ A* is optimal with admissible / consistent heuristics ▪ Heuristic design is key: often use relaxed problems Source: http://www.doksinet Tree Search Pseudo-Code Source: http://www.doksinet Graph Search Pseudo-Code Source: http://www.doksinet CS 188: Artificial Intelligence Constraint Satisfaction Problems Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet What is Search For? ▪ Assumptions about the world: a single agent, deterministic actions, fully observed state, discrete state space ▪ Planning: sequences of actions ▪ The path to the goal is the important thing ▪ Paths have various

costs, depths ▪ Heuristics give problem-specific guidance ▪ Identification: assignments to variables ▪ The goal itself is important, not the path ▪ All paths at the same depth (for some formulations) ▪ CSPs are specialized for identification problems Source: http://www.doksinet Constraint Satisfaction Problems Source: http://www.doksinet Constraint Satisfaction Problems ▪ Standard search problems: ▪ State is a “black box”: arbitrary data structure ▪ Goal test can be any function over states ▪ Successor function can also be anything ▪ Constraint satisfaction problems (CSPs): ▪ A special subset of search problems ▪ State is defined by variables Xi with values from a domain D (sometimes D depends on i) ▪ Goal test is a set of constraints specifying allowable combinations of values for subsets of variables ▪ Simple example of a formal representation language ▪ Allows useful general-purpose algorithms with more power than standard search algorithms

Source: http://www.doksinet CSP Examples Source: http://www.doksinet Example: Map Coloring ▪ Variables: ▪ Domains: ▪ Constraints: adjacent regions must have different colors Implicit: Explicit: ▪ Solutions are assignments satisfying all constraints, e.g: Source: http://www.doksinet Example: N-Queens ▪ Formulation 1: ▪ Variables: ▪ Domains: ▪ Constraints Source: http://www.doksinet Example: N-Queens ▪ Formulation 2: ▪ Variables: ▪ Domains: ▪ Constraints: Implicit: Explicit: Source: http://www.doksinet Constraint Graphs Source: http://www.doksinet Constraint Graphs ▪ Binary CSP: each constraint relates (at most) two variables ▪ Binary constraint graph: nodes are variables, arcs show constraints ▪ General-purpose CSP algorithms use the graph structure to speed up search. Eg, Tasmania is an independent subproblem! [Demo: CSP applet (made available by aispace.org) -- n-queens] Source: http://www.doksinet Screenshot of Demo N-Queens

Source: http://www.doksinet Example: Cryptarithmetic ▪ Variables: ▪ Domains: ▪ Constraints: Source: http://www.doksinet Example: Sudoku ▪ ▪ ▪ Variables: ▪ Each (open) square Domains: ▪ {1,2,,9} Constraints: 9-way alldiff for each column 9-way alldiff for each row 9-way alldiff for each region (or can have a bunch of pairwise inequality constraints) Source: http://www.doksinet Example: The Waltz Algorithm ▪ The Waltz algorithm is for interpreting line drawings of solid polyhedra as 3D objects ▪ An early example of an AI computation posed as a CSP ? ▪ Approach: ▪ Each intersection is a variable ▪ Adjacent intersections impose constraints on each other ▪ Solutions are physically realizable 3D interpretations Source: http://www.doksinet Varieties of CSPs and Constraints Source: http://www.doksinet Varieties of CSPs ▪ Discrete Variables ▪ Finite domains ▪ Size d means O(dn) complete assignments ▪ E.g, Boolean CSPs, including Boolean

satisfiability (NPcomplete) ▪ Infinite domains (integers, strings, etc.) ▪ E.g, job scheduling, variables are start/end times for each job ▪ Linear constraints solvable, nonlinear undecidable ▪ Continuous variables ▪ E.g, start/end times for Hubble Telescope observations ▪ Linear constraints solvable in polynomial time by LP methods (see cs170 for a bit of this theory) Source: http://www.doksinet Varieties of Constraints ▪ Varieties of Constraints ▪ Unary constraints involve a single variable (equivalent to reducing domains), e.g: ▪ Binary constraints involve pairs of variables, e.g: ▪ Higher-order constraints involve 3 or more variables: e.g, cryptarithmetic column constraints ▪ Preferences (soft constraints): ▪ ▪ ▪ ▪ E.g, red is better than green Often representable by a cost for each variable assignment Gives constrained optimization problems (We’ll ignore these until we get to Bayes’ nets) Source: http://www.doksinet Real-World CSPs ▪

▪ ▪ ▪ ▪ ▪ ▪ ▪ Assignment problems: e.g, who teaches what class Timetabling problems: e.g, which class is offered when and where? Hardware configuration Transportation scheduling Factory scheduling Circuit layout Fault diagnosis lots more! ▪ Many real-world problems involve real-valued variables Source: http://www.doksinet Solving CSPs Source: http://www.doksinet Standard Search Formulation ▪ Standard search formulation of CSPs ▪ States defined by the values assigned so far (partial assignments) ▪ Initial state: the empty assignment, {} ▪ Successor function: assign a value to an unassigned variable ▪ Goal test: the current assignment is complete and satisfies all constraints ▪ We’ll start with the straightforward, naïve approach, then improve it Source: http://www.doksinet Search Methods ▪ What would BFS do? ▪ What would DFS do? ▪ What problems does naïve search have? [Demo: coloring -- dfs] Source: http://www.doksinet Video of

Demo Coloring -- DFS Source: http://www.doksinet Backtracking Search Source: http://www.doksinet Backtracking Search ▪ Backtracking search is the basic uninformed algorithm for solving CSPs ▪ Idea 1: One variable at a time ▪ Variable assignments are commutative, so fix ordering ▪ I.e, [WA = red then NT = green] same as [NT = green then WA = red] ▪ Only need to consider assignments to a single variable at each step ▪ Idea 2: Check constraints as you go ▪ I.e consider only values which do not conflict previous assignments ▪ Might have to do some computation to check the constraints ▪ “Incremental goal test” ▪ Depth-first search with these two improvements is called backtracking search (not the best name) ▪ Can solve n-queens for n  25 Source: http://www.doksinet Backtracking Example Source: http://www.doksinet Backtracking Search ▪ Backtracking = DFS + variable-ordering + fail-on-violation ▪ What are the choice points? [Demo: coloring --

backtracking] Source: http://www.doksinet Video of Demo Coloring – Backtracking Source: http://www.doksinet Improving Backtracking ▪ General-purpose ideas give huge gains in speed ▪ Ordering: ▪ Which variable should be assigned next? ▪ In what order should its values be tried? ▪ Filtering: Can we detect inevitable failure early? ▪ Structure: Can we exploit the problem structure? Source: http://www.doksinet Filtering Source: http://www.doksinet Filtering: Forward Checking ▪ Filtering: Keep track of domains for unassigned variables and cross off bad options ▪ Forward checking: Cross off values that violate a constraint when added to the existing assignment WA NT Q SA NSW V [Demo: coloring -- forward checking] Source: http://www.doksinet Video of Demo Coloring – Backtracking with Forward Checking Source: http://www.doksinet Filtering: Constraint Propagation ▪ Forward checking propagates information from assigned to unassigned variables, but

doesnt provide early detection for all failures: NT WA SA Q NSW V ▪ NT and SA cannot both be blue! ▪ Why didn’t we detect this yet? ▪ Constraint propagation: reason from constraint to constraint Source: http://www.doksinet Consistency of A Single Arc ▪ An arc X Y is consistent iff for every x in the tail there is some y in the head which could be assigned without violating a constraint NT WA SA Q NSW V Delete from the tail! ▪ Forward checking: Enforcing consistency of arcs pointing to each new assignment Source: http://www.doksinet Arc Consistency of an Entire CSP ▪ A simple form of propagation makes sure all arcs are consistent: NT WA SA Q NSW V ▪ ▪ ▪ ▪ Important: If X loses a value, neighbors of X need to be rechecked! Arc consistency detects failure earlier than forward checking Can be run as a preprocessor or after each assignment What’s the downside of enforcing arc consistency? Remember: Delete from the tail! Source:

http://www.doksinet Enforcing Arc Consistency in a CSP ▪ Runtime: O(n2d3), can be reduced to O(n2d2) ▪ but detecting all possible future problems is NP-hard – why? [Demo: CSP applet (made available by aispace.org) -- n-queens] Source: http://www.doksinet Video of Demo Arc Consistency – CSP Applet – n Queens Source: http://www.doksinet Limitations of Arc Consistency ▪ After enforcing arc consistency: ▪ Can have one solution left ▪ Can have multiple solutions left ▪ Can have no solutions left (and not know it) ▪ Arc consistency still runs inside a backtracking search! What went wrong here? [Demo: coloring -- forward checking] [Demo: coloring -- arc consistency] Video of Demo Coloring – Backtracking with Forward Checking – Complex Graph Source: http://www.doksinet Video of Demo Coloring – Backtracking with Arc Consistency – Complex Graph Source: http://www.doksinet Source: http://www.doksinet Ordering Source: http://www.doksinet Ordering:

Minimum Remaining Values ▪ Variable Ordering: Minimum remaining values (MRV): ▪ Choose the variable with the fewest legal left values in its domain ▪ Why min rather than max? ▪ Also called “most constrained variable” ▪ “Fail-fast” ordering Source: http://www.doksinet Ordering: Least Constraining Value ▪ Value Ordering: Least Constraining Value ▪ Given a choice of variable, choose the least constraining value ▪ I.e, the one that rules out the fewest values in the remaining variables ▪ Note that it may take some computation to determine this! (E.g, rerunning filtering) ▪ Why least rather than most? ▪ Combining these ordering ideas makes 1000 queens feasible Source: http://www.doksinet Demo: Coloring -- Backtracking + Forward Checking + Ordering Source: http://www.doksinet CS 188: Artificial Intelligence Constraint Satisfaction Problems II Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by

Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Today ▪ Efficient Solution of CSPs ▪ Local Search Source: http://www.doksinet Reminder: CSPs ▪ CSPs: ▪ Variables ▪ Domains ▪ Constraints ▪ Implicit (provide code to compute) ▪ Explicit (provide a list of the legal tuples) ▪ Unary / Binary / N-ary ▪ Goals: ▪ Here: find any solution ▪ Also: find all, find best, etc. Source: http://www.doksinet Backtracking Search Source: http://www.doksinet Improving Backtracking ▪ General-purpose ideas give huge gains in speed ▪ but it’s all still NP-hard ▪ Filtering: Can we detect inevitable failure early? ▪ Ordering: ▪ Which variable should be assigned next? (MRV) ▪ In what order should its values be tried? (LCV) ▪ Structure: Can we exploit the problem structure? Source: http://www.doksinet Arc Consistency and Beyond Source:

http://www.doksinet Arc Consistency of an Entire CSP ▪ A simple form of propagation makes sure all arcs are simultaneously consistent: NT WA SA Q NSW V ▪ Arc consistency detects failure earlier than forward checking ▪ Important: If X loses a value, neighbors of X need to be rechecked! ▪ Must rerun after each assignment! Remember: Delete from the tail! Source: http://www.doksinet Limitations of Arc Consistency ▪ After enforcing arc consistency: ▪ Can have one solution left ▪ Can have multiple solutions left ▪ Can have no solutions left (and not know it) ▪ Arc consistency still runs inside a backtracking search! What went wrong here? Source: http://www.doksinet K-Consistency Source: http://www.doksinet K-Consistency ▪ Increasing degrees of consistency ▪ 1-Consistency (Node Consistency): Each single node’s domain has a value which meets that node’s unary constraints ▪ 2-Consistency (Arc Consistency): For each pair of nodes, any consistent

assignment to one can be extended to the other ▪ K-Consistency: For each k nodes, any consistent assignment to k-1 can be extended to the kth node. ▪ Higher k more expensive to compute ▪ (You need to know the k=2 case: arc consistency) Source: http://www.doksinet Strong K-Consistency ▪ Strong k-consistency: also k-1, k-2, 1 consistent ▪ Claim: strong n-consistency means we can solve without backtracking! ▪ Why? ▪ ▪ ▪ ▪ ▪ ▪ Choose any assignment to any variable Choose a new variable By 2-consistency, there is a choice consistent with the first Choose a new variable By 3-consistency, there is a choice consistent with the first 2 ▪ Lots of middle ground between arc consistency and n-consistency! (e.g k=3, called path consistency) Source: http://www.doksinet Structure Source: http://www.doksinet Problem Structure ▪ Extreme case: independent subproblems ▪ Example: Tasmania and mainland do not interact ▪ Independent subproblems are identifiable

as connected components of constraint graph ▪ Suppose a graph of n variables can be broken into subproblems of only c variables: ▪ ▪ ▪ ▪ Worst-case solution cost is O((n/c)(dc)), linear in n E.g, n = 80, d = 2, c =20 280 = 4 billion years at 10 million nodes/sec (4)(220) = 0.4 seconds at 10 million nodes/sec Source: http://www.doksinet Tree-Structured CSPs ▪ Theorem: if the constraint graph has no loops, the CSP can be solved in O(n d2) time ▪ Compare to general CSPs, where worst-case time is O(dn) ▪ This property also applies to probabilistic reasoning (later): an example of the relation between syntactic restrictions and the complexity of reasoning Source: http://www.doksinet Tree-Structured CSPs ▪ Algorithm for tree-structured CSPs: ▪ Order: Choose a root variable, order variables so that parents precede children ▪ Remove backward: For i = n : 2, apply RemoveInconsistent(Parent(Xi),Xi) ▪ Assign forward: For i = 1 : n, assign Xi consistently with

Parent(Xi) ▪ Runtime: O(n d2) (why?) Source: http://www.doksinet Tree-Structured CSPs ▪ Claim 1: After backward pass, all root-to-leaf arcs are consistent ▪ Proof: Each XY was made consistent at one point and Y’s domain could not have been reduced thereafter (because Y’s children were processed before Y) ▪ Claim 2: If root-to-leaf arcs are consistent, forward assignment will not backtrack ▪ Proof: Induction on position ▪ Why doesn’t this algorithm work with cycles in the constraint graph? ▪ Note: we’ll see this basic idea again with Bayes’ nets Source: http://www.doksinet Improving Structure Source: http://www.doksinet Nearly Tree-Structured CSPs ▪ Conditioning: instantiate a variable, prune its neighbors domains ▪ Cutset conditioning: instantiate (in all ways) a set of variables such that the remaining constraint graph is a tree ▪ Cutset size c gives runtime O( (dc) (n-c) d2 ), very fast for small c Source: http://www.doksinet Cutset

Conditioning Choose a cutset SA Instantiate the cutset (all possible ways) SA Compute residual CSP for each assignment Solve the residual CSPs (tree structured) SA SA Source: http://www.doksinet Cutset Quiz ▪ Find the smallest cutset for the graph below. Source: http://www.doksinet Tree Decomposition* ▪ Idea: create a tree-structured graph of mega-variables ▪ Each mega-variable encodes part of the original CSP ▪ Subproblems overlap to ensure consistent solutions M1 M2  SA {(NT=r,SA=g,Q=b), (NT=b,SA=g,Q=r), }  NS W  SA shared vars {(WA=r,SA=g,NT=b), (WA=b,SA=r,NT=g), }   Q M4 Agree on SA Q shared vars   NT Agree on  shared vars NT Agree on  WA M3 NS W  V   SA Agree: (M1,M2)  {((WA=g,SA=g,NT=g), (NT=g,SA=g,Q=g)), } Source: http://www.doksinet Iterative Improvement Source: http://www.doksinet Iterative Algorithms for CSPs ▪ Local search methods typically work with “complete” states, i.e, all

variables assigned ▪ To apply to CSPs: ▪ Take an assignment with unsatisfied constraints ▪ Operators reassign variable values ▪ No fringe! Live on the edge. ▪ Algorithm: While not solved, ▪ Variable selection: randomly select any conflicted variable ▪ Value selection: min-conflicts heuristic: ▪ Choose a value that violates the fewest constraints ▪ I.e, hill climb with h(n) = total number of violated constraints Source: http://www.doksinet Example: 4-Queens ▪ ▪ ▪ ▪ States: 4 queens in 4 columns (44 = 256 states) Operators: move queen in column Goal test: no attacks Evaluation: c(n) = number of attacks [Demo: n-queens – iterative improvement (L5D1)] [Demo: coloring – iterative improvement] Source: http://www.doksinet Video of Demo Iterative Improvement – n Queens Source: http://www.doksinet Video of Demo Iterative Improvement – Coloring Source: http://www.doksinet Performance of Min-Conflicts ▪ Given random initial state, can solve

n-queens in almost constant time for arbitrary n with high probability (e.g, n = 10,000,000)! ▪ The same appears to be true for any randomly-generated CSP except in a narrow range of the ratio Source: http://www.doksinet Summary: CSPs ▪ CSPs are a special kind of search problem: ▪ States are partial assignments ▪ Goal test defined by constraints ▪ Basic solution: backtracking search ▪ Speed-ups: ▪ Ordering ▪ Filtering ▪ Structure ▪ Iterative min-conflicts is often effective in practice Source: http://www.doksinet Local Search Source: http://www.doksinet Local Search ▪ Tree search keeps unexplored alternatives on the fringe (ensures completeness) ▪ Local search: improve a single option until you can’t make it better (no fringe!) ▪ New successor function: local changes ▪ Generally much faster and more memory efficient (but incomplete and suboptimal) Source: http://www.doksinet Hill Climbing ▪ Simple, general idea: ▪ Start wherever ▪

Repeat: move to the best neighboring state ▪ If no neighbors better than current, quit ▪ What’s bad about this approach? ▪ Complete? ▪ Optimal? ▪ What’s good about it? Source: http://www.doksinet Hill Climbing Diagram Source: http://www.doksinet Hill Climbing Quiz Starting from X, where do you end up ? Starting from Y, where do you end up ? Starting from Z, where do you end up ? Source: http://www.doksinet Simulated Annealing ▪ Idea: Escape local maxima by allowing downhill moves ▪ But make them rarer as time goes on 38 Source: http://www.doksinet Simulated Annealing ▪ Theoretical guarantee: ▪ Stationary distribution: ▪ If T decreased slowly enough, will converge to optimal state! ▪ Is this an interesting guarantee? ▪ Sounds like magic, but reality is reality: ▪ The more downhill steps you need to escape a local optimum, the less likely you are to ever make them all in a row ▪ People think hard about ridge operators which let you jump

around the space in better ways Source: http://www.doksinet Genetic Algorithms ▪ Genetic algorithms use a natural selection metaphor ▪ Keep best N hypotheses at each step (selection) based on a fitness function ▪ Also have pairwise crossover operators, with optional mutation to give variety ▪ Possibly the most misunderstood, misapplied (and even maligned) technique around Source: http://www.doksinet Example: N-Queens ▪ ▪ ▪ ▪ Why does crossover make sense here? When wouldn’t it make sense? What would mutation be? What would a good fitness function be? Source: http://www.doksinet Next Time: Adversarial Search! Source: http://www.doksinet CS 188: Artificial Intelligence Adversarial Search Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Game

Playing State-of-the-Art ▪ Checkers: 1950: First computer player. 1994: First computer champion: Chinook ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame. 2007: Checkers solved! ▪ Chess: 1997: Deep Blue defeats human champion Gary Kasparov in a six-game match. Deep Blue examined 200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply. Current programs are even better, if less historic. ▪ Go: Human champions are now starting to be challenged by machines, though the best humans still beat the best machines. In go, b > 300! Classic programs use pattern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion methods. ▪ Pacman Source: http://www.doksinet Behavior from Computation [Demo: mystery pacman (L6D1)] Source: http://www.doksinet Video of Demo Mystery Pacman Source: http://www.doksinet Adversarial Games Source:

http://www.doksinet Types of Games ▪ Many different kinds of games! ▪ Axes: ▪ ▪ ▪ ▪ Deterministic or stochastic? One, two, or more players? Zero sum? Perfect information (can you see the state)? ▪ Want algorithms for calculating a strategy (policy) which recommends a move from each state Source: http://www.doksinet Deterministic Games ▪ Many possible formalizations, one is: ▪ ▪ ▪ ▪ ▪ ▪ States: S (start at s0) Players: P={1.N} (usually take turns) Actions: A (may depend on player / state) Transition Function: SxA S Terminal Test: S {t,f} Terminal Utilities: SxP R ▪ Solution for a player is a policy: S A Source: http://www.doksinet Zero-Sum Games ▪ Zero-Sum Games ▪ Agents have opposite utilities (values on outcomes) ▪ Lets us think of a single value that one maximizes and the other minimizes ▪ Adversarial, pure competition ▪ General Games ▪ Agents have independent utilities (values on outcomes) ▪ Cooperation, indifference,

competition, and more are all possible ▪ More later on non-zero-sum games Source: http://www.doksinet Adversarial Search Source: http://www.doksinet Single-Agent Trees 8 2 0 2 6 4 6 Source: http://www.doksinet Value of a State Value of a state: The best achievable outcome (utility) from that state Non-Terminal States: 8 2 0 2 6 4 6 Terminal States: Source: http://www.doksinet Adversarial Game Trees -20 -8 -18 -5 -10 +4 -20 +8 Source: http://www.doksinet Minimax Values States Under Agent’s Control: -8 States Under Opponent’s Control: -5 -10 Terminal States: +8 Source: http://www.doksinet Tic-Tac-Toe Game Tree Source: http://www.doksinet Adversarial Search (Minimax) ▪ Deterministic, zero-sum games: ▪ Tic-tac-toe, chess, checkers ▪ One player maximizes result ▪ The other minimizes result ▪ Minimax search: ▪ A state-space search tree ▪ Players alternate turns ▪ Compute each node’s minimax value: the best

achievable utility against a rational (optimal) adversary Minimax values: computed recursively 5 max 5 2 8 2 5 Terminal values: part of the game min 6 Source: http://www.doksinet Minimax Implementation def max-value(state): initialize v = -∞ for each successor of state: v = max(v, min-value(successor)) return v def min-value(state): initialize v = +∞ for each successor of state: v = min(v, max-value(successor)) return v Source: http://www.doksinet Minimax Implementation (Dispatch) def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v def min-value(state): initialize v = +∞ for each successor of state: v = min(v, value(successor)) return v Source: http://www.doksinet Minimax Example 3 12 8 2 4 6 14 5 2 Source:

http://www.doksinet Minimax Efficiency ▪ How efficient is minimax? ▪ Just like (exhaustive) DFS ▪ Time: O(bm) ▪ Space: O(bm) ▪ Example: For chess, b  35, m  100 ▪ Exact solution is completely infeasible ▪ But, do we need to explore the whole tree? Source: http://www.doksinet Minimax Properties max min 10 10 9 100 Optimal against a perfect player. Otherwise? [Demo: min vs exp (L6D2, L6D3)] Source: http://www.doksinet Video of Demo Min vs. Exp (Min) Source: http://www.doksinet Video of Demo Min vs. Exp (Exp) Source: http://www.doksinet Resource Limits Source: http://www.doksinet Resource Limits ▪ Problem: In realistic games, cannot search to leaves! ▪ Solution: Depth-limited search ▪ Instead, search only to a limited depth in the tree ▪ Replace terminal utilities with an evaluation function for non-terminal positions max 4 -2 -1 4 -2 4 ? ? min 9 ▪ Example: ▪ Suppose we have 100 seconds, can explore 10K nodes / sec ▪ So

can check 1M nodes per move ▪ - reaches about depth 8 – decent chess program ▪ Guarantee of optimal play is gone ▪ More plies makes a BIG difference ▪ Use iterative deepening for an anytime algorithm ? ? Source: http://www.doksinet Depth Matters ▪ Evaluation functions are always imperfect ▪ The deeper in the tree the evaluation function is buried, the less the quality of the evaluation function matters ▪ An important example of the tradeoff between complexity of features and complexity of computation [Demo: depth limited (L6D4, L6D5)] Source: http://www.doksinet Video of Demo Limited Depth (2) Source: http://www.doksinet Video of Demo Limited Depth (10) Source: http://www.doksinet Evaluation Functions Source: http://www.doksinet Evaluation Functions ▪ Evaluation functions score non-terminals in depth-limited search ▪ Ideal function: returns the actual minimax value of the position ▪ In practice: typically weighted linear sum of features:

▪ e.g f1(s) = (num white queens – num black queens), etc Source: http://www.doksinet Evaluation for Pacman [Demo: thrashing d=2, thrashing d=2 (fixed evaluation function), smart ghosts coordinate (L6D6,7,8,10)] Source: http://www.doksinet Video of Demo Thrashing (d=2) Source: http://www.doksinet Why Pacman Starves ▪ A danger of replanning agents! ▪ ▪ ▪ ▪ He knows his score will go up by eating the dot now (west, east) He knows his score will go up just as much by eating the dot later (east, west) There are no point-scoring opportunities after eating the dot (within the horizon, two here) Therefore, waiting seems just as good as eating: he may go east, then back west in the next round of replanning! Source: http://www.doksinet Video of Demo Thrashing -- Fixed (d=2) Source: http://www.doksinet Video of Demo Smart Ghosts (Coordination) Source: http://www.doksinet Video of Demo Smart Ghosts (Coordination) – Zoomed In Source: http://www.doksinet Game

Tree Pruning Source: http://www.doksinet Minimax Example 3 12 8 2 4 6 14 5 2 Source: http://www.doksinet Minimax Pruning 3 12 8 2 14 5 2 Source: http://www.doksinet Alpha-Beta Pruning ▪ General configuration (MIN version) ▪ We’re computing the MIN-VALUE at some node n MAX ▪ We’re looping over n’s children ▪ n’s estimate of the childrens’ min is dropping MIN a ▪ Who cares about n’s value? MAX ▪ Let a be the best value that MAX can get at any choice point along the current path from the root ▪ If n becomes worse than a, MAX will avoid it, so we can stop considering n’s other children (it’s already bad enough that it won’t be played) ▪ MAX version is symmetric MAX MIN n Source: http://www.doksinet Alpha-Beta Implementation α: MAX’s best option on path to root β: MIN’s best option on path to root def max-value(state, α, β): initialize v = -∞ for each successor of state: v = max(v, value(successor, α, β)) if

v ≥ β return v α = max(α, v) return v def min-value(state , α, β): initialize v = +∞ for each successor of state: v = min(v, value(successor, α, β)) if v ≤ α return v β = min(β, v) return v Source: http://www.doksinet Alpha-Beta Pruning Properties ▪ This pruning has no effect on minimax value computed for the root! ▪ Values of intermediate nodes might be wrong max ▪ Important: children of the root may have the wrong value ▪ So the most naïve version won’t let you do action selection min ▪ Good child ordering improves effectiveness of pruning ▪ With “perfect ordering”: ▪ Time complexity drops to O(bm/2) ▪ Doubles solvable depth! ▪ Full search of, e.g chess, is still hopeless 10 10 ▪ This is a simple example of metareasoning (computing about what to compute) 0 Source: http://www.doksinet Alpha-Beta Quiz Source: http://www.doksinet Alpha-Beta Quiz 2 Source: http://www.doksinet Next Time: Uncertainty! Source:

http://www.doksinet CS 188: Artificial Intelligence Uncertainty and Utilities Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Uncertain Outcomes Source: http://www.doksinet Worst-Case vs. Average Case max min 10 10 9 100 Idea: Uncertain outcomes controlled by chance, not an adversary! Source: http://www.doksinet Expectimax Search ▪ Why wouldn’t we know what the result of an action will be? ▪ Explicit randomness: rolling dice ▪ Unpredictable opponents: the ghosts respond randomly ▪ Actions can fail: when moving a robot, wheels might slip max ▪ Values should now reflect average-case (expectimax) outcomes, not worst-case (minimax) outcomes ▪ Expectimax search: compute the average score under optimal play ▪ ▪ ▪ ▪ Max nodes as in minimax

search Chance nodes are like min nodes but the outcome is uncertain Calculate their expected utilities I.e take weighted average (expectation) of children chance 10 10 4 5 9 100 7 ▪ Later, we’ll learn how to formalize the underlying uncertainresult problems as Markov Decision Processes [Demo: min vs exp (L7D1,2)] Source: http://www.doksinet Video of Demo Minimax vs Expectimax (Min) Source: http://www.doksinet Video of Demo Minimax vs Expectimax (Exp) Source: http://www.doksinet Expectimax Pseudocode def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state) def max-value(state): initialize v = -∞ for each successor of state: v = max(v, value(successor)) return v def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v Source: http://www.doksinet Expectimax Pseudocode

def exp-value(state): initialize v = 0 for each successor of state: p = probability(successor) v += p * value(successor) return v 1/2 1/3 5 8 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 24 7 1/6 -12 Source: http://www.doksinet Expectimax Example 3 12 9 2 4 6 15 6 0 Source: http://www.doksinet Expectimax Pruning? 3 12 9 2 Source: http://www.doksinet Depth-Limited Expectimax 400 300 492 Estimate of true expectimax value (which would require a lot of work to compute) 362 Source: http://www.doksinet Probabilities Source: http://www.doksinet Reminder: Probabilities ▪ A random variable represents an event whose outcome is unknown ▪ A probability distribution is an assignment of weights to outcomes ▪ Example: Traffic on freeway 0.25 ▪ Random variable: T = whether there’s traffic ▪ Outcomes: T in {none, light, heavy} ▪ Distribution: P(T=none) = 0.25, P(T=light) = 050, P(T=heavy) = 025 ▪ Some laws of probability (more later): ▪

Probabilities are always non-negative ▪ Probabilities over all possible outcomes sum to one 0.50 ▪ As we get more evidence, probabilities may change: ▪ P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 060 ▪ We’ll talk about methods for reasoning and updating probabilities later 0.25 Source: http://www.doksinet Reminder: Expectations ▪ The expected value of a function of a random variable is the average, weighted by the probability distribution over outcomes ▪ Example: How long to get to the airport? Time: 20 min x Probability: 0.25 + 30 min x 0.50 + 60 min x 0.25 35 min Source: http://www.doksinet What Probabilities to Use? ▪ In expectimax search, we have a probabilistic model of how the opponent (or environment) will behave in any state ▪ Model could be a simple uniform distribution (roll a die) ▪ Model could be sophisticated and require a great deal of computation ▪ We have a chance node for any outcome out of our control: opponent or environment

▪ The model might say that adversarial actions are likely! ▪ For now, assume each chance node magically comes along with probabilities that specify the distribution over its outcomes Having a probabilistic belief about another agent’s action does not mean that the agent is flipping any coins! Source: http://www.doksinet Quiz: Informed Probabilities ▪ Let’s say you know that your opponent is actually running a depth 2 minimax, using the result 80% of the time, and moving randomly otherwise ▪ Question: What tree search should you use? ▪ Answer: Expectimax! 0.1 0.9 ▪ To figure out EACH chance node’s probabilities, you have to run a simulation of your opponent ▪ This kind of thing gets very slow very quickly ▪ Even worse if you have to simulate your opponent simulating you ▪ except for minimax, which has the nice property that it all collapses into one game tree Source: http://www.doksinet Modeling Assumptions Source: http://www.doksinet The Dangers of

Optimism and Pessimism Dangerous Optimism Dangerous Pessimism Assuming chance when the world is adversarial Assuming the worst case when it’s not likely Source: http://www.doksinet Assumptions vs. Reality Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Avg. Score: -303 Avg. Score: 503 Minimax Pacman Expectimax Pacman Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)] Source: http://www.doksinet Video of Demo World Assumptions Random Ghost – Expectimax Pacman Source: http://www.doksinet Video of Demo World Assumptions Adversarial Ghost – Minimax Pacman Source: http://www.doksinet Video of Demo World Assumptions Adversarial Ghost – Expectimax Pacman Source: http://www.doksinet Video of Demo World Assumptions Random Ghost – Minimax Pacman Source:

http://www.doksinet Other Game Types Source: http://www.doksinet Mixed Layer Types ▪ E.g Backgammon ▪ Expectiminimax ▪ Environment is an extra “random agent” player that moves after each min/max agent ▪ Each node computes the appropriate combination of its children Source: http://www.doksinet Example: Backgammon ▪ Dice rolls increase b: 21 possible rolls with 2 dice ▪ Backgammon  20 legal moves ▪ Depth 2 = 20 x (21 x 20)3 = 1.2 x 109 ▪ As depth increases, probability of reaching a given search node shrinks ▪ So usefulness of search is diminished ▪ So limiting depth is less damaging ▪ But pruning is trickier ▪ Historic AI: TDGammon uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play ▪ 1st AI world champion in any game! Image: Wikipedia Source: http://www.doksinet Multi-Agent Utilities ▪ What if the game is not zero-sum, or has multiple players? ▪ Generalization of minimax: ▪ ▪ ▪

▪ Terminals have utility tuples Node values are also utility tuples Each player maximizes its own component Can give rise to cooperation and competition dynamically 1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5 Source: http://www.doksinet Utilities Source: http://www.doksinet Maximum Expected Utility ▪ Why should we average utilities? Why not minimax? ▪ Principle of maximum expected utility: ▪ A rational agent should chose the action that maximizes its expected utility, given its knowledge ▪ Questions: ▪ ▪ ▪ ▪ Where do utilities come from? How do we know such utilities even exist? How do we know that averaging even makes sense? What if our behavior (preferences) can’t be described by utilities? Source: http://www.doksinet What Utilities to Use? 0 40 20 30 x2 0 1600 400 900 ▪ For worst-case minimax reasoning, terminal function scale doesn’t matter ▪ We just want better states to have higher evaluations (get the ordering right)

▪ We call this insensitivity to monotonic transformations ▪ For average-case expectimax reasoning, we need magnitudes to be meaningful Source: http://www.doksinet Utilities ▪ Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences ▪ Where do utilities come from? ▪ In a game, may be simple (+1/-1) ▪ Utilities summarize the agent’s goals ▪ Theorem: any “rational” preferences can be summarized as a utility function ▪ We hard-wire utilities and let behaviors emerge ▪ Why don’t we let agents pick utilities? ▪ Why don’t we prescribe behaviors? Source: http://www.doksinet Utilities: Uncertain Outcomes Getting ice cream Get Single Get Double Oops Whew! Source: http://www.doksinet Preferences ▪ An agent must have preferences among: ▪ Prizes: A, B, etc. ▪ Lotteries: situations with uncertain prizes A Prize A Lottery A p A ▪ Notation: ▪ Preference: ▪ Indifference: 1-p B

Source: http://www.doksinet Rationality Source: http://www.doksinet Rational Preferences ▪ We want some constraints on preferences before we call them rational, such as: Axiom of Transitivity: ( A  B)  ( B  C )  ( A  C ) ▪ For example: an agent with intransitive preferences can be induced to give away all of its money ▪ If B > C, then an agent with C would pay (say) 1 cent to get B ▪ If A > B, then an agent with B would pay (say) 1 cent to get A ▪ If C > A, then an agent with A would pay (say) 1 cent to get C Source: http://www.doksinet Rational Preferences The Axioms of Rationality Theorem: Rational preferences imply behavior describable as maximization of expected utility Source: http://www.doksinet MEU Principle ▪ Theorem [Ramsey, 1931; von Neumann & Morgenstern, 1944] ▪ Given any preferences satisfying these constraints, there exists a real-valued function U such that: ▪ I.e values assigned by U preserve preferences of both

prizes and lotteries! ▪ Maximum expected utility (MEU) principle: ▪ Choose the action that maximizes expected utility ▪ Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities ▪ E.g, a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner Source: http://www.doksinet Human Utilities Source: http://www.doksinet Utility Scales ▪ Normalized utilities: u+ = 1.0, u- = 00 ▪ Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc. ▪ QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk ▪ Note: behavior is invariant under positive linear transformation ▪ With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e, total order on prizes Source: http://www.doksinet Human Utilities ▪ Utilities map states to real numbers. Which numbers? ▪ Standard approach to assessment

(elicitation) of human utilities: ▪ Compare a prize A to a standard lottery Lp between ▪ “best possible prize” u+ with probability p ▪ “worst possible catastrophe” u- with probability 1-p ▪ Adjust lottery probability p until indifference: A ~ Lp ▪ Resulting p is a utility in [0,1] Pay $30 0.999999 0.000001 No change Instant death Source: http://www.doksinet Money ▪ Money does not behave as a utility function, but we can talk about the utility of having money (or being in debt) ▪ Given a lottery L = [p, $X; (1-p), $Y] ▪ The expected monetary value EMV(L) is p*X + (1-p)Y ▪ U(L) = p*U($X) + (1-p)U($Y) ▪ Typically, U(L) < U( EMV(L) ) ▪ In this sense, people are risk-averse ▪ When deep in debt, people are risk-prone Source: http://www.doksinet Example: Insurance ▪ Consider the lottery [0.5, $1000; 05, $0] ▪ What is its expected monetary value? ($500) ▪ What is its certainty equivalent? ▪ Monetary value acceptable in lieu of lottery ▪

$400 for most people ▪ Difference of $100 is the insurance premium ▪ There’s an insurance industry because people will pay to reduce their risk ▪ If everyone were risk-neutral, no insurance needed! ▪ It’s win-win: you’d rather have the $400 and the insurance company would rather have the lottery (their utility curve is flat and they have many lotteries) Source: http://www.doksinet Example: Human Rationality? ▪ Famous example of Allais (1953) ▪ A: [0.8, $4k; 02, $0] ▪ B: [1.0, $3k; 00, $0] ▪ C: [0.2, $4k; 08, $0] ▪ D: [0.25, $3k; 075, $0] ▪ Most people prefer B > A, C > D ▪ But if U($0) = 0, then ▪ B > A  U($3k) > 0.8 U($4k) ▪ C > D  0.8 U($4k) > U($3k) Source: http://www.doksinet Next Time: MDPs! Source: http://www.doksinet CS 188: Artificial Intelligence Markov Decision Processes Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for

CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Non-Deterministic Search Source: http://www.doksinet Example: Grid World ▪ A maze-like problem ▪ ▪ The agent lives in a grid Walls block the agent’s path ▪ Noisy movement: actions do not always go as planned ▪ ▪ ▪ 80% of the time, the action North takes the agent North (if there is no wall there) 10% of the time, North takes the agent West; 10% East If there is a wall in the direction the agent would have been taken, the agent stays put ▪ The agent receives rewards each time step ▪ ▪ Small “living” reward each step (can be negative) Big rewards come at the end (good or bad) ▪ Goal: maximize sum of rewards Source: http://www.doksinet Grid World Actions Deterministic Grid World Stochastic Grid World Source: http://www.doksinet Markov Decision Processes ▪ An MDP is defined by: ▪ A set of states s  S ▪ A set of

actions a  A ▪ A transition function T(s, a, s’) ▪ Probability that a from s leads to s’, i.e, P(s’| s, a) ▪ Also called the model or the dynamics ▪ A reward function R(s, a, s’) ▪ Sometimes just R(s) or R(s’) ▪ A start state ▪ Maybe a terminal state ▪ MDPs are non-deterministic search problems ▪ One way to solve them is with expectimax search ▪ We’ll have a new tool soon [Demo – gridworld manual intro (L8D1)] Source: http://www.doksinet Video of Demo Gridworld Manual Intro Source: http://www.doksinet What is Markov about MDPs? ▪ “Markov” generally means that given the present state, the future and the past are independent ▪ For Markov decision processes, “Markov” means action outcomes depend only on the current state Andrey Markov (1856-1922) ▪ This is just like search, where the successor function could only depend on the current state (not the history) Source: http://www.doksinet Policies ▪ In deterministic

single-agent search problems, we wanted an optimal plan, or sequence of actions, from start to a goal ▪ For MDPs, we want an optimal policy *: S A ▪ A policy  gives an action for each state ▪ An optimal policy is one that maximizes expected utility if followed ▪ An explicit policy defines a reflex agent ▪ Expectimax didn’t compute entire policies ▪ It computed the action for a single state only Optimal policy when R(s, a, s’) = -0.03 for all non-terminals s Source: http://www.doksinet Optimal Policies R(s) = -0.01 R(s) = -0.03 R(s) = -0.4 R(s) = -2.0 Source: http://www.doksinet Example: Racing Source: http://www.doksinet Example: Racing ▪ ▪ ▪ ▪ A robot car wants to travel far, quickly Three states: Cool, Warm, Overheated Two actions: Slow, Fast Going faster gets double reward 0.5 +1 Fast +1 Slow 1.0 -10 0.5 Warm Slow Fast 1.0 +1 Cool 0.5 +2 0.5 +2 Overheated Source: http://www.doksinet Racing Search Tree Source:

http://www.doksinet MDP Search Trees ▪ Each MDP state projects an expectimax-like search tree s is a state s a (s, a) is a qstate s, a (s,a,s’) called a transition T(s,a,s’) = P(s’|s,a) s,a,s’ R(s,a,s’) s’ Source: http://www.doksinet Utilities of Sequences Source: http://www.doksinet Utilities of Sequences ▪ What preferences should an agent have over reward sequences? ▪ More or less? [1, 2, 2] or [2, 3, 4] ▪ Now or later? [0, 0, 1] or [1, 0, 0] Source: http://www.doksinet Discounting ▪ It’s reasonable to maximize the sum of rewards ▪ It’s also reasonable to prefer rewards now to rewards later ▪ One solution: values of rewards decay exponentially Worth Now Worth Next Step Worth In Two Steps Source: http://www.doksinet Discounting ▪ How to discount? ▪ Each time we descend a level, we multiply in the discount once ▪ Why discount? ▪ Sooner rewards probably do have higher utility than later rewards ▪ Also helps our

algorithms converge ▪ Example: discount of 0.5 ▪ U([1,2,3]) = 1*1 + 0.5*2 + 0.25*3 ▪ U([1,2,3]) < U([3,2,1]) Source: http://www.doksinet Stationary Preferences ▪ Theorem: if we assume stationary preferences: ▪ Then: there are only two ways to define utilities ▪ Additive utility: ▪ Discounted utility: Source: http://www.doksinet Quiz: Discounting ▪ Given: ▪ Actions: East, West, and Exit (only available in exit states a, e) ▪ Transitions: deterministic ▪ Quiz 1: For  = 1, what is the optimal policy? ▪ Quiz 2: For  = 0.1, what is the optimal policy? ▪ Quiz 3: For which  are West and East equally good when in state d? Source: http://www.doksinet Infinite Utilities?! ▪ Problem: What if the game lasts forever? Do we get infinite rewards? ▪ Solutions: ▪ Finite horizon: (similar to depth-limited search) ▪ Terminate episodes after a fixed T steps (e.g life) ▪ Gives nonstationary policies ( depends on time left) ▪ Discounting: use

0 <  < 1 ▪ Smaller  means smaller “horizon” – shorter term focus ▪ Absorbing state: guarantee that for every policy, a terminal state will eventually be reached (like “overheated” for racing) Source: http://www.doksinet Recap: Defining MDPs ▪ Markov decision processes: ▪ ▪ ▪ ▪ ▪ Set of states S Start state s0 Set of actions A Transitions P(s’|s,a) (or T(s,a,s’)) Rewards R(s,a,s’) (and discount ) ▪ MDP quantities so far: ▪ Policy = Choice of action for each state ▪ Utility = sum of (discounted) rewards s a s, a s,a,s’ s’ Source: http://www.doksinet Solving MDPs Source: http://www.doksinet Optimal Quantities ▪ The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally ▪ The value (utility) of a q-state (s,a): Q*(s,a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally s s is a state a (s, a) is a q-state s, a s,a,s’ s’

(s,a,s’) is a transition ▪ The optimal policy: *(s) = optimal action from state s [Demo – gridworld values (L8D4)] Source: http://www.doksinet Snapshot of Demo – Gridworld V Values Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet Snapshot of Demo – Gridworld Q Values Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet Values of States ▪ Fundamental operation: compute the (expectimax) value of a state ▪ Expected utility under optimal action ▪ Average sum of (discounted) rewards ▪ This is just what expectimax computed! s a s, a ▪ Recursive definition of value: s,a,s’ s’ Source: http://www.doksinet Racing Search Tree Source: http://www.doksinet Racing Search Tree Source: http://www.doksinet Racing Search Tree ▪ We’re doing way too much work with expectimax! ▪ Problem: States are repeated ▪ Idea: Only compute needed quantities once ▪ Problem: Tree goes on forever ▪ Idea: Do a

depth-limited computation, but with increasing depths until change is small ▪ Note: deep parts of the tree eventually don’t matter if γ < 1 Source: http://www.doksinet Time-Limited Values ▪ Key idea: time-limited values ▪ Define Vk(s) to be the optimal value of s if the game ends in k more time steps ▪ Equivalently, it’s what a depth-k expectimax would give from s [Demo – time-limited values (L8D6)] Source: http://www.doksinet k=0 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=1 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=2 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=3 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=4 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=5 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=6 Noise = 0.2 Discount = 0.9 Living reward = 0

Source: http://www.doksinet k=7 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=8 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=9 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=10 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=11 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=12 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=100 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet Computing Time-Limited Values Source: http://www.doksinet Value Iteration Source: http://www.doksinet Value Iteration ▪ Start with V0(s) = 0: no time steps left means an expected reward sum of zero ▪ Given vector of Vk(s) values, do one ply of expectimax from each state: Vk+1(s) a s, a ▪ Repeat until convergence s,a,s’ Vk(s’) ▪ Complexity of each iteration: O(S2A) ▪

Theorem: will converge to unique optimal values ▪ Basic idea: approximations get refined towards optimal values ▪ Policy may converge long before values do Source: http://www.doksinet Example: Value Iteration 3.5 2.5 0 2 1 0 Assume no discount! 0 0 0 Source: http://www.doksinet Convergence* ▪ How do we know the Vk vectors are going to converge? ▪ Case 1: If the tree has maximum depth M, then VM holds the actual untruncated values ▪ Case 2: If the discount is less than 1 ▪ Sketch: For any state Vk and Vk+1 can be viewed as depth k+1 expectimax results in nearly identical search trees ▪ The difference is that on the bottom layer, Vk+1 has actual rewards while Vk has zeros ▪ That last layer is at best all RMAX ▪ It is at worst RMIN ▪ But everything is discounted by γk that far out ▪ So Vk and Vk+1 are at most γk max|R| different ▪ So as k increases, the values converge Source: http://www.doksinet Next Time: Policy-Based Methods Source:

http://www.doksinet CS 188: Artificial Intelligence Markov Decision Processes II Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Example: Grid World ▪ A maze-like problem ▪ ▪ The agent lives in a grid Walls block the agent’s path ▪ Noisy movement: actions do not always go as planned ▪ ▪ ▪ 80% of the time, the action North takes the agent North 10% of the time, North takes the agent West; 10% East If there is a wall in the direction the agent would have been taken, the agent stays put ▪ The agent receives rewards each time step ▪ ▪ Small “living” reward each step (can be negative) Big rewards come at the end (good or bad) ▪ Goal: maximize sum of (discounted) rewards Source: http://www.doksinet Recap: MDPs ▪ Markov decision

processes: ▪ ▪ ▪ ▪ ▪ States S Actions A Transitions P(s’|s,a) (or T(s,a,s’)) Rewards R(s,a,s’) (and discount ) Start state s0 s a s, a s,a,s’ ▪ Quantities: ▪ ▪ ▪ ▪ Policy = map of states to actions Utility = sum of discounted rewards Values = expected future utility from a state (max node) Q-Values = expected future utility from a q-state (chance node) s’ Source: http://www.doksinet Optimal Quantities ▪ The value (utility) of a state s: V*(s) = expected utility starting in s and acting optimally s s is a state a ▪ The value (utility) of a q-state (s,a): Q*(s,a) = expected utility starting out having taken action a from state s and (thereafter) acting optimally (s, a) is a q-state s, a s,a,s’ s’ (s,a,s’) is a transition ▪ The optimal policy: *(s) = optimal action from state s [Demo: gridworld values (L9D1)] Source: http://www.doksinet Gridworld Values V* Source: http://www.doksinet Gridworld: Q* Source:

http://www.doksinet The Bellman Equations How to be optimal: Step 1: Take correct first action Step 2: Keep being optimal Source: http://www.doksinet The Bellman Equations ▪ Definition of “optimal utility” via expectimax recurrence gives a simple one-step lookahead relationship amongst optimal utility values s a s, a s,a,s’ s’ ▪ These are the Bellman equations, and they characterize optimal values in a way we’ll use over and over Source: http://www.doksinet Value Iteration ▪ Bellman equations characterize the optimal values: V(s) a s, a s,a,s’ ▪ Value iteration computes them: ▪ Value iteration is just a fixed point solution method ▪ though the Vk vectors are also interpretable as time-limited values V(s’) Source: http://www.doksinet Convergence* ▪ How do we know the Vk vectors are going to converge? ▪ Case 1: If the tree has maximum depth M, then VM holds the actual untruncated values ▪ Case 2: If the discount is less than 1 ▪ Sketch:

For any state Vk and Vk+1 can be viewed as depth k+1 expectimax results in nearly identical search trees ▪ The difference is that on the bottom layer, Vk+1 has actual rewards while Vk has zeros ▪ That last layer is at best all RMAX ▪ It is at worst RMIN ▪ But everything is discounted by γk that far out ▪ So Vk and Vk+1 are at most γk max|R| different ▪ So as k increases, the values converge Source: http://www.doksinet Policy Methods Source: http://www.doksinet Policy Evaluation Source: http://www.doksinet Fixed Policies Do the optimal action Do what  says to do s s a (s) s, a s, (s) s, (s),s’ s,a,s’ s’ s’ ▪ Expectimax trees max over all actions to compute the optimal values ▪ If we fixed some policy (s), then the tree would be simpler – only one action per state ▪ though the tree’s value would depend on which policy we fixed Source: http://www.doksinet Utilities for a Fixed Policy ▪ Another basic operation: compute

the utility of a state s under a fixed (generally non-optimal) policy s (s) ▪ Define the utility of a state s, under a fixed policy : s, (s) V(s) = expected total discounted rewards starting in s and following  s, (s),s’ ▪ Recursive relation (one-step look-ahead / Bellman equation): s’ Source: http://www.doksinet Example: Policy Evaluation Always Go Right Always Go Forward Source: http://www.doksinet Example: Policy Evaluation Always Go Right Always Go Forward Source: http://www.doksinet Policy Evaluation ▪ How do we calculate the V’s for a fixed policy ? s (s) ▪ Idea 1: Turn recursive Bellman equations into updates (like value iteration) s, (s) s, (s),s’ s’ ▪ Efficiency: O(S2) per iteration ▪ Idea 2: Without the maxes, the Bellman equations are just a linear system ▪ Solve with Matlab (or your favorite linear system solver) Source: http://www.doksinet Policy Extraction Source: http://www.doksinet

Computing Actions from Values ▪ Let’s imagine we have the optimal values V*(s) ▪ How should we act? ▪ It’s not obvious! ▪ We need to do a mini-expectimax (one step) ▪ This is called policy extraction, since it gets the policy implied by the values Source: http://www.doksinet Computing Actions from Q-Values ▪ Let’s imagine we have the optimal q-values: ▪ How should we act? ▪ Completely trivial to decide! ▪ Important lesson: actions are easier to select from q-values than values! Source: http://www.doksinet Policy Iteration Source: http://www.doksinet Problems with Value Iteration ▪ Value iteration repeats the Bellman updates: s a s, a ▪ Problem 1: It’s slow – O(S2A) per iteration s,a,s’ s’ ▪ Problem 2: The “max” at each state rarely changes ▪ Problem 3: The policy often converges long before the values [Demo: value iteration (L9D2)] Source: http://www.doksinet k=0 Noise = 0.2 Discount = 0.9 Living reward = 0 Source:

http://www.doksinet k=1 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=2 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=3 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=4 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=5 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=6 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=7 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=8 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=9 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=10 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=11 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet k=12 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet

k=100 Noise = 0.2 Discount = 0.9 Living reward = 0 Source: http://www.doksinet Policy Iteration ▪ Alternative approach for optimal values: ▪ Step 1: Policy evaluation: calculate utilities for some fixed policy (not optimal utilities!) until convergence ▪ Step 2: Policy improvement: update policy using one-step look-ahead with resulting converged (but not optimal!) utilities as future values ▪ Repeat steps until policy converges ▪ This is policy iteration ▪ It’s still optimal! ▪ Can converge (much) faster under some conditions Source: http://www.doksinet Policy Iteration ▪ Evaluation: For fixed current policy , find values with policy evaluation: ▪ Iterate until values converge: ▪ Improvement: For fixed values, get a better policy using policy extraction ▪ One-step look-ahead: Source: http://www.doksinet Comparison ▪ Both value iteration and policy iteration compute the same thing (all optimal values) ▪ In value iteration: ▪ Every iteration

updates both the values and (implicitly) the policy ▪ We don’t track the policy, but taking the max over actions implicitly recomputes it ▪ In policy iteration: ▪ We do several passes that update utilities with fixed policy (each pass is fast because we consider only one action, not all of them) ▪ After the policy is evaluated, a new policy is chosen (slow like a value iteration pass) ▪ The new policy will be better (or we’re done) ▪ Both are dynamic programs for solving MDPs Source: http://www.doksinet Summary: MDP Algorithms ▪ So you want to. ▪ Compute optimal values: use value iteration or policy iteration ▪ Compute values for a particular policy: use policy evaluation ▪ Turn your values into a policy: use policy extraction (one-step lookahead) ▪ These all look the same! ▪ They basically are – they are all variations of Bellman updates ▪ They all use one-step lookahead expectimax fragments ▪ They differ only in whether we plug in a fixed

policy or max over actions Source: http://www.doksinet Double Bandits Source: http://www.doksinet Double-Bandit MDP ▪ Actions: Blue, Red ▪ States: Win, Lose 0.25 W $0 0.75 $2 0.25 $0 $1 0.75 1.0 No discount 100 time steps Both states have the same value $2 L $1 1.0 Source: http://www.doksinet Offline Planning ▪ Solving MDPs is offline planning No discount 100 time steps Both states have the same value ▪ You determine all quantities through computation ▪ You need to know the details of the MDP ▪ You do not actually play the game! 0.25 $0 Value Play Red Play Blue 150 100 W 0.75 $2 $1 0.75 1.0 0.25 $0 L $1 $2 1.0 Source: http://www.doksinet Let’s Play! $2 $2 $0 $2 $2 $2 $2 $0 $0 $0 Source: http://www.doksinet Online Planning ▪ Rules changed! Red’s win chance is different. ?? W $0 ?? $2 ?? $0 $1 ?? 1.0 $2 L $1 1.0 Source: http://www.doksinet Let’s Play! $0 $0 $0 $2 $0 $2 $0 $0 $0 $0 Source: http://www.doksinet What

Just Happened? ▪ That wasn’t planning, it was learning! ▪ Specifically, reinforcement learning ▪ There was an MDP, but you couldn’t solve it with just computation ▪ You needed to actually act to figure it out ▪ Important ideas in reinforcement learning that came up ▪ ▪ ▪ ▪ ▪ Exploration: you have to try unknown actions to get information Exploitation: eventually, you have to use what you know Regret: even if you learn intelligently, you make mistakes Sampling: because of chance, you have to try things repeatedly Difficulty: learning can be much harder than solving a known MDP Source: http://www.doksinet Next Time: Reinforcement Learning! Source: http://www.doksinet CS 188: Artificial Intelligence Reinforcement Learning Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu]

Source: http://www.doksinet Reinforcement Learning Source: http://www.doksinet Reinforcement Learning Agent State: s Reward: r Actions: a Environment ▪ Basic idea: ▪ ▪ ▪ ▪ Receive feedback in the form of rewards Agent’s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards All learning is based on observed samples of outcomes! Source: http://www.doksinet Example: Learning to Walk Initial [Kohl and Stone, ICRA 2004] A Learning Trial After Learning [1K Trials] Source: http://www.doksinet Example: Learning to Walk [Kohl and Stone, ICRA 2004] Initial [Video: AIBO WALK – initial] Source: http://www.doksinet Example: Learning to Walk [Kohl and Stone, ICRA 2004] Training [Video: AIBO WALK – training] Source: http://www.doksinet Example: Learning to Walk [Kohl and Stone, ICRA 2004] Finished [Video: AIBO WALK – finished] Source: http://www.doksinet Example: Toddler Robot [Tedrake, Zhang and Seung,

2005] [Video: TODDLER – 40s] Source: http://www.doksinet The Crawler! [Demo: Crawler Bot (L10D1)] [You, in Project 3] Source: http://www.doksinet Video of Demo Crawler Bot Source: http://www.doksinet Reinforcement Learning ▪ Still assume a Markov decision process (MDP): ▪ ▪ ▪ ▪ A set of states s  S A set of actions (per state) A A model T(s,a,s’) A reward function R(s,a,s’) ▪ Still looking for a policy (s) ▪ New twist: don’t know T or R ▪ I.e we don’t know which states are good or what the actions do ▪ Must actually try actions and states out to learn Source: http://www.doksinet Offline (MDPs) vs. Online (RL) Offline Solution Online Learning Source: http://www.doksinet Model-Based Learning Source: http://www.doksinet Model-Based Learning ▪ Model-Based Idea: ▪ Learn an approximate model based on experiences ▪ Solve for values as if the learned model were correct ▪ Step 1: Learn empirical MDP model ▪ Count outcomes

s’ for each s, a ▪ Normalize to give an estimate of ▪ Discover each when we experience (s, a, s’) ▪ Step 2: Solve the learned MDP ▪ For example, use value iteration, as before Source: http://www.doksinet Example: Model-Based Learning Input Policy  A B C E Assume:  = 1 D Observed Episodes (Training) Episode 1 Episode 2 B, east, C, -1 C, east, D, -1 D, exit, x, +10 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Episode 3 Episode 4 E, north, C, -1 C, east, D, -1 D, exit, x, +10 E, north, C, -1 C, east, A, -1 A, exit, x, -10 Learned Model T(s,a,s’). T(B, east, C) = 1.00 T(C, east, D) = 0.75 T(C, east, A) = 0.25 R(s,a,s’). R(B, east, C) = -1 R(C, east, D) = -1 R(D, exit, x) = +10 Source: http://www.doksinet Example: Expected Age Goal: Compute expected age of cs188 students Known P(A) Without P(A), instead collect samples [a1, a2, aN] Unknown P(A): “Model Based” Why does this work? Because eventually you learn the right model. Unknown P(A):

“Model Free” Why does this work? Because samples appear with the right frequencies. Source: http://www.doksinet Model-Free Learning Source: http://www.doksinet Passive Reinforcement Learning Source: http://www.doksinet Passive Reinforcement Learning ▪ Simplified task: policy evaluation ▪ ▪ ▪ ▪ Input: a fixed policy (s) You don’t know the transitions T(s,a,s’) You don’t know the rewards R(s,a,s’) Goal: learn the state values ▪ In this case: ▪ ▪ ▪ ▪ Learner is “along for the ride” No choice about what actions to take Just execute the policy and learn from experience This is NOT offline planning! You actually take actions in the world. Source: http://www.doksinet Direct Evaluation ▪ Goal: Compute values for each state under  ▪ Idea: Average together observed sample values ▪ Act according to  ▪ Every time you visit a state, write down what the sum of discounted rewards turned out to be ▪ Average those samples ▪ This

is called direct evaluation Source: http://www.doksinet Example: Direct Evaluation Input Policy  A B C E Assume:  = 1 D Observed Episodes (Training) Episode 1 Episode 2 B, east, C, -1 C, east, D, -1 D, exit, x, +10 B, east, C, -1 C, east, D, -1 D, exit, x, +10 Output Values A B Episode 3 Episode 4 E, north, C, -1 C, east, D, -1 D, exit, x, +10 E, north, C, -1 C, east, A, -1 A, exit, x, -10 +8 C E -10 +4 -2 +10 D Source: http://www.doksinet Problems with Direct Evaluation ▪ What’s good about direct evaluation? ▪ It’s easy to understand ▪ It doesn’t require any knowledge of T, R ▪ It eventually computes the correct average values, using just sample transitions ▪ What bad about it? ▪ It wastes information about state connections ▪ Each state must be learned separately ▪ So, it takes a long time to learn Output Values -10 A +8 +4 +10 B C D -2 E If B and E both go to C under this policy, how can their values be different? Source:

http://www.doksinet Why Not Use Policy Evaluation? ▪ Simplified Bellman updates calculate V for a fixed policy: s ▪ Each round, replace V with a one-step-look-ahead layer over V (s) s, (s) s, (s),s’ s’ ▪ This approach fully exploited the connections between the states ▪ Unfortunately, we need T and R to do it! ▪ Key question: how can we do this update to V without knowing T and R? ▪ In other words, how to we take a weighted average without knowing the weights? Source: http://www.doksinet Sample-Based Policy Evaluation? ▪ We want to improve our estimate of V by computing these averages: ▪ Idea: Take samples of outcomes s’ (by doing the action!) and average s (s) s, (s) s, (s),s’ s2 s 1 s s3 Almost! But we can’t rewind time to get sample after sample from state s. Source: http://www.doksinet Temporal Difference Learning ▪ Big idea: learn from every experience! ▪ Update V(s) each time we experience a transition (s, a, s’,

r) ▪ Likely outcomes s’ will contribute updates more often s (s) s, (s) ▪ Temporal difference learning of values ▪ Policy still fixed, still doing evaluation! ▪ Move values toward value of whatever successor occurs: running average Sample of V(s): Update to V(s): Same update: s’ Source: http://www.doksinet Exponential Moving Average ▪ Exponential moving average ▪ The running interpolation update: ▪ Makes recent samples more important: ▪ Forgets about the past (distant past values were wrong anyway) ▪ Decreasing learning rate (alpha) can give converging averages Source: http://www.doksinet Example: Temporal Difference Learning States Observed Transitions B, east, C, -2 A B C 0 D E Assume:  = 1, α = 1/2 0 0 0 C, east, D, -2 0 8 -1 0 0 0 8 -1 3 0 8 Source: http://www.doksinet Problems with TD Value Learning ▪ TD value leaning is a model-free way to do policy evaluation, mimicking Bellman updates with running sample

averages ▪ However, if we want to turn values into a (new) policy, we’re sunk: s a s, a ▪ Idea: learn Q-values, not values ▪ Makes action selection model-free too! s,a,s’ s’ Source: http://www.doksinet Active Reinforcement Learning Source: http://www.doksinet Active Reinforcement Learning ▪ Full reinforcement learning: optimal policies (like value iteration) ▪ ▪ ▪ ▪ You don’t know the transitions T(s,a,s’) You don’t know the rewards R(s,a,s’) You choose the actions now Goal: learn the optimal policy / values ▪ In this case: ▪ Learner makes choices! ▪ Fundamental tradeoff: exploration vs. exploitation ▪ This is NOT offline planning! You actually take actions in the world and find out what happens Source: http://www.doksinet Detour: Q-Value Iteration ▪ Value iteration: find successive (depth-limited) values ▪ Start with V0(s) = 0, which we know is right ▪ Given Vk, calculate the depth k+1 values for all states: ▪ But Q-values

are more useful, so compute them instead ▪ Start with Q0(s,a) = 0, which we know is right ▪ Given Qk, calculate the depth k+1 q-values for all q-states: Source: http://www.doksinet Q-Learning ▪ Q-Learning: sample-based Q-value iteration ▪ Learn Q(s,a) values as you go ▪ Receive a sample (s,a,s’,r) ▪ Consider your old estimate: ▪ Consider your new sample estimate: ▪ Incorporate the new estimate into a running average: [Demo: Q-learning – gridworld (L10D2)] [Demo: Q-learning – crawler (L10D3)] Source: http://www.doksinet Video of Demo Q-Learning -- Gridworld Source: http://www.doksinet Video of Demo Q-Learning -- Crawler Source: http://www.doksinet Q-Learning Properties ▪ Amazing result: Q-learning converges to optimal policy -- even if you’re acting suboptimally! ▪ This is called off-policy learning ▪ Caveats: ▪ You have to explore enough ▪ You have to eventually make the learning rate small enough ▪ but not decrease it too quickly ▪

Basically, in the limit, it doesn’t matter how you select actions (!) Source: http://www.doksinet CS 188: Artificial Intelligence Reinforcement Learning II Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Reinforcement Learning ▪ We still assume an MDP: ▪ ▪ ▪ ▪ A set of states s  S A set of actions (per state) A A model T(s,a,s’) A reward function R(s,a,s’) ▪ Still looking for a policy (s) ▪ New twist: don’t know T or R, so must try out actions ▪ Big idea: Compute all averages over T using sample outcomes Source: http://www.doksinet The Story So Far: MDPs and RL Known MDP: Offline Solution Goal Technique Compute V*, Q,  Value / policy iteration Evaluate a fixed policy  Policy evaluation Unknown MDP: Model-Based Unknown

MDP: Model-Free Goal Technique Goal Technique Compute V*, Q,  VI/PI on approx. MDP Compute V*, Q,  Q-learning Evaluate a fixed policy  PE on approx. MDP Evaluate a fixed policy  Value Learning Source: http://www.doksinet Model-Free Learning ▪ Model-free (temporal difference) learning ▪ Experience world through episodes s a s, a r s’ ▪ Update estimates each transition a’ s’, a’ ▪ Over time, updates will mimic Bellman updates s’’ Source: http://www.doksinet Q-Learning ▪ We’d like to do Q-value updates to each Q-state: ▪ But can’t compute this update without knowing T, R ▪ Instead, compute average as we go ▪ Receive a sample transition (s,a,r,s’) ▪ This sample suggests ▪ But we want to average over results from (s,a) (Why?) ▪ So keep a running average Source: http://www.doksinet Q-Learning Properties ▪ Amazing result: Q-learning converges to optimal policy -- even if you’re acting suboptimally! ▪ This

is called off-policy learning ▪ Caveats: ▪ You have to explore enough ▪ You have to eventually make the learning rate small enough ▪ but not decrease it too quickly ▪ Basically, in the limit, it doesn’t matter how you select actions (!) [Demo: Q-learning – auto – cliff grid (L11D1)] Source: http://www.doksinet Video of Demo Q-Learning Auto Cliff Grid Source: http://www.doksinet Exploration vs. Exploitation Source: http://www.doksinet How to Explore? ▪ Several schemes for forcing exploration ▪ Simplest: random actions (-greedy) ▪ Every time step, flip a coin ▪ With (small) probability , act randomly ▪ With (large) probability 1-, act on current policy ▪ Problems with random actions? ▪ You do eventually explore the space, but keep thrashing around once learning is done ▪ One solution: lower  over time ▪ Another solution: exploration functions [Demo: Q-learning – manual exploration – bridge grid (L11D2)] [Demo: Q-learning –

epsilon-greedy -- crawler (L11D3)] Source: http://www.doksinet Video of Demo Q-learning – Manual Exploration – Bridge Grid Source: http://www.doksinet Video of Demo Q-learning – Epsilon-Greedy – Crawler Source: http://www.doksinet Exploration Functions ▪ When to explore? ▪ Random actions: explore a fixed amount ▪ Better idea: explore areas whose badness is not (yet) established, eventually stop exploring ▪ Exploration function ▪ Takes a value estimate u and a visit count n, and returns an optimistic utility, e.g Regular Q-Update: Modified Q-Update: ▪ Note: this propagates the “bonus” back to states that lead to unknown states as well! [Demo: exploration – Q-learning – crawler – exploration function (L11D4)] Source: http://www.doksinet Video of Demo Q-learning – Exploration Function – Crawler Source: http://www.doksinet Regret ▪ Even if you learn the optimal policy, you still make mistakes along the way! ▪ Regret is a measure of

your total mistake cost: the difference between your (expected) rewards, including youthful suboptimality, and optimal (expected) rewards ▪ Minimizing regret goes beyond learning to be optimal – it requires optimally learning to be optimal ▪ Example: random exploration and exploration functions both end up optimal, but random exploration has higher regret Source: http://www.doksinet Approximate Q-Learning Source: http://www.doksinet Generalizing Across States ▪ Basic Q-Learning keeps a table of all q-values ▪ In realistic situations, we cannot possibly learn about every single state! ▪ Too many states to visit them all in training ▪ Too many states to hold the q-tables in memory ▪ Instead, we want to generalize: ▪ Learn about some small number of training states from experience ▪ Generalize that experience to new, similar situations ▪ This is a fundamental idea in machine learning, and we’ll see it over and over again [demo – RL pacman] Source:

http://www.doksinet Example: Pacman Let’s say we discover through experience that this state is bad: In naïve q-learning, we know nothing about this state: Or even this one! [Demo: Q-learning – pacman – tiny – watch all (L11D5)] [Demo: Q-learning – pacman – tiny – silent train (L11D6)] [Demo: Q-learning – pacman – tricky – watch all (L11D7)] Source: http://www.doksinet Video of Demo Q-Learning Pacman – Tiny – Watch All Source: http://www.doksinet Video of Demo Q-Learning Pacman – Tiny – Silent Train Source: http://www.doksinet Video of Demo Q-Learning Pacman – Tricky – Watch All Source: http://www.doksinet Feature-Based Representations ▪ Solution: describe a state using a vector of features (properties) ▪ Features are functions from states to real numbers (often 0/1) that capture important properties of the state ▪ Example features: ▪ ▪ ▪ ▪ ▪ ▪ ▪ Distance to closest ghost Distance to closest dot Number of ghosts 1 /

(dist to dot)2 Is Pacman in a tunnel? (0/1) etc. Is it the exact state on this slide? ▪ Can also describe a q-state (s, a) with features (e.g action moves closer to food) Source: http://www.doksinet Linear Value Functions ▪ Using a feature representation, we can write a q function (or value function) for any state using a few weights: ▪ Advantage: our experience is summed up in a few powerful numbers ▪ Disadvantage: states may share features but actually be very different in value! Source: http://www.doksinet Approximate Q-Learning ▪ Q-learning with linear Q-functions: Exact Q’s Approximate Q’s ▪ Intuitive interpretation: ▪ Adjust weights of active features ▪ E.g, if something unexpectedly bad happens, blame the features that were on: disprefer all states with that state’s features ▪ Formal justification: online least squares Source: http://www.doksinet Example: Q-Pacman [Demo: approximate Qlearning pacman (L11D10)] Source: http://www.doksinet

Video of Demo Approximate Q-Learning -- Pacman Source: http://www.doksinet Q-Learning and Least Squares Source: http://www.doksinet Linear Approximation: Regression* 40 26 24 20 22 20 30 40 20 0 0 20 30 20 10 10 0 Prediction: Prediction: 0 Source: http://www.doksinet Optimization: Least Squares* Error or “residual” Observation Prediction 0 0 20 Source: http://www.doksinet Minimizing Error* Imagine we had only one point x, with features f(x), target value y, and weights w: Approximate q update explained: “target” “prediction” Source: http://www.doksinet Overfitting: Why Limiting Capacity Can Help* 30 25 20 Degree 15 polynomial 15 10 5 0 -5 -10 -15 0 2 4 6 8 10 12 14 16 18 20 Source: http://www.doksinet Policy Search Source: http://www.doksinet Policy Search ▪ Problem: often the feature-based policies that work well (win games, maximize utilities) aren’t the ones that approximate V / Q best ▪ E.g your value

functions from project 2 were probably horrible estimates of future rewards, but they still produced good decisions ▪ Q-learning’s priority: get Q-values close (modeling) ▪ Action selection priority: get ordering of Q-values right (prediction) ▪ We’ll see this distinction between modeling and prediction again later in the course ▪ Solution: learn policies that maximize rewards, not the values that predict them ▪ Policy search: start with an ok solution (e.g Q-learning) then fine-tune by hill climbing on feature weights Source: http://www.doksinet Policy Search ▪ Simplest policy search: ▪ Start with an initial linear value function or Q-function ▪ Nudge each feature weight up and down and see if your policy is better than before ▪ Problems: ▪ How do we tell the policy got better? ▪ Need to run many sample episodes! ▪ If there are a lot of features, this can be impractical ▪ Better methods exploit lookahead structure, sample wisely, change multiple

parameters Source: http://www.doksinet Policy Search [Andrew Ng] [Video: HELICOPTER] Source: http://www.doksinet Conclusion ▪ We’re done with Part I: Search and Planning! ▪ We’ve seen how AI methods can solve problems in: ▪ ▪ ▪ ▪ ▪ Search Constraint Satisfaction Problems Games Markov Decision Problems Reinforcement Learning ▪ Next up: Part II: Uncertainty and Learning! Source: http://www.doksinet Our Status in CS188 ▪ We’re done with Part I Search and Planning! ▪ Part II: Probabilistic Reasoning ▪ ▪ ▪ ▪ ▪ ▪ ▪ Diagnosis Speech recognition Tracking objects Robot mapping Genetics Error correcting codes lots more! ▪ Part III: Machine Learning Source: http://www.doksinet CS 188: Artificial Intelligence Probability Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

http://aiberkeleyedu] Source: http://www.doksinet Today ▪ Probability ▪ ▪ ▪ ▪ ▪ ▪ Random Variables Joint and Marginal Distributions Conditional Distribution Product Rule, Chain Rule, Bayes’ Rule Inference Independence ▪ You’ll need all this stuff A LOT for the next few weeks, so make sure you go over it now! Source: http://www.doksinet Inference in Ghostbusters ▪ A ghost is in the grid somewhere ▪ Sensor readings tell how close a square is to the ghost ▪ ▪ ▪ ▪ On the ghost: red 1 or 2 away: orange 3 or 4 away: yellow 5+ away: green ▪ Sensors are noisy, but we know P(Color | Distance) P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3 [Demo: Ghostbuster – no probability (L12D1) ] Source: http://www.doksinet Video of Demo Ghostbuster – No probability Source: http://www.doksinet Uncertainty ▪ General situation: ▪ Observed variables (evidence): Agent knows certain things about the state of the world (e.g,

sensor readings or symptoms) ▪ Unobserved variables: Agent needs to reason about other aspects (e.g where an object is or what disease is present) ▪ Model: Agent knows something about how the known variables relate to the unknown variables ▪ Probabilistic reasoning gives us a framework for managing our beliefs and knowledge Source: http://www.doksinet Random Variables ▪ A random variable is some aspect of the world about which we (may) have uncertainty ▪ ▪ ▪ ▪ R = Is it raining? T = Is it hot or cold? D = How long will it take to drive to work? L = Where is the ghost? ▪ We denote random variables with capital letters ▪ Like variables in a CSP, random variables have domains ▪ ▪ ▪ ▪ R in {true, false} (often write as {+r, -r}) T in {hot, cold} D in [0, ) L in possible locations, maybe {(0,0), (0,1), } Source: http://www.doksinet Probability Distributions ▪ Associate a probability with each value ▪ Weather: ▪ Temperature: T P hot 0.5

cold 0.5 W P sun 0.6 rain 0.1 fog 0.3 meteor 0.0 Source: http://www.doksinet Probability Distributions ▪ Unobserved random variables have distributions T P W P hot 0.5 sun 0.6 cold 0.5 rain 0.1 fog 0.3 meteor 0.0 ▪ A distribution is a TABLE of probabilities of values ▪ A probability (lower case value) is a single number ▪ Must have: and Shorthand notation: OK if all domain entries are unique Source: http://www.doksinet Joint Distributions ▪ A joint distribution over a set of random variables: specifies a real number for each assignment (or outcome): ▪ Must obey: ▪ Size of distribution if n variables with domain sizes d? ▪ For all but the smallest distributions, impractical to write out! T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 Source: http://www.doksinet Probabilistic Models ▪ A probabilistic model is a joint distribution over a set of random variables Distribution over T,W T W P ▪

Probabilistic models: hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 ▪ (Random) variables with domains ▪ Assignments are called outcomes ▪ Joint distributions: say whether assignments (outcomes) are likely ▪ Normalized: sum to 1.0 ▪ Ideally: only certain variables directly interact ▪ Constraint satisfaction problems: ▪ Variables with domains ▪ Constraints: state whether assignments are possible ▪ Ideally: only certain variables directly interact Constraint over T,W T W P hot sun T hot rain F cold sun F cold rain T Source: http://www.doksinet Events ▪ An event is a set E of outcomes ▪ From a joint distribution, we can calculate the probability of any event ▪ Probability that it’s hot AND sunny? ▪ Probability that it’s hot? ▪ Probability that it’s hot OR sunny? ▪ Typically, the events we care about are partial assignments, like P(T=hot) T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3

Source: http://www.doksinet Quiz: Events ▪ P(+x, +y) ? ▪ P(+x) ? ▪ P(-y OR +x) ? X Y P +x +y 0.2 +x -y 0.3 -x +y 0.4 -x -y 0.1 Source: http://www.doksinet Marginal Distributions ▪ Marginal distributions are sub-tables which eliminate variables ▪ Marginalization (summing out): Combine collapsed rows by adding T P hot 0.5 cold 0.5 T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 W P cold rain 0.3 sun 0.6 rain 0.4 Source: http://www.doksinet Quiz: Marginal Distributions X P +x X Y P +x +y 0.2 +x -y 0.3 -x +y 0.4 Y -x -y 0.1 +y -x -y P Source: http://www.doksinet Conditional Probabilities ▪ A simple relation between joint and conditional probabilities ▪ In fact, this is taken as the definition of a conditional probability P(a,b) P(a) T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 P(b) Source: http://www.doksinet Quiz: Conditional Probabilities ▪ P(+x | +y) ? X Y P +x +y

0.2 +x -y 0.3 -x +y 0.4 -x -y 0.1 ▪ P(-x | +y) ? ▪ P(-y | +x) ? Source: http://www.doksinet Conditional Distributions ▪ Conditional distributions are probability distributions over some variables given fixed values of others Conditional Distributions W P sun 0.8 rain 0.2 W P sun 0.4 rain 0.6 Joint Distribution T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 Source: http://www.doksinet Normalization Trick T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 W P sun 0.4 rain 0.6 Source: http://www.doksinet Normalization Trick T W P hot sun 0.4 hot rain cold cold SELECT the joint probabilities matching the evidence NORMALIZE the selection (make it sum to one) T W P W P 0.1 cold sun 0.2 sun 0.4 sun 0.2 cold rain 0.3 rain 0.6 rain 0.3 Source: http://www.doksinet Normalization Trick T W P hot sun 0.4 hot rain cold cold SELECT the joint probabilities matching

the evidence NORMALIZE the selection (make it sum to one) T W P W P 0.1 cold sun 0.2 sun 0.4 sun 0.2 cold rain 0.3 rain 0.6 rain 0.3 ▪ Why does this work? Sum of selection is P(evidence)! (P(T=c), here) Source: http://www.doksinet Quiz: Normalization Trick ▪ P(X | Y=-y) ? X Y P +x +y 0.2 +x -y 0.3 -x +y 0.4 -x -y 0.1 SELECT the joint probabilities matching the evidence NORMALIZE the selection (make it sum to one) Source: http://www.doksinet To Normalize ▪ (Dictionary) To bring or restore to a normal condition All entries sum to ONE ▪ Procedure: ▪ Step 1: Compute Z = sum over all entries ▪ Step 2: Divide every entry by Z ▪ Example 1 W P sun 0.2 rain 0.3 ▪ Example 2 Normalize Z = 0.5 W P T W P sun 0.4 hot sun 20 rain 0.6 hot rain 5 cold sun 10 cold rain 15 Normalize Z = 50 T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 Source: http://www.doksinet Probabilistic Inference

▪ Probabilistic inference: compute a desired probability from other known probabilities (e.g conditional from joint) ▪ We generally compute conditional probabilities ▪ P(on time | no reported accidents) = 0.90 ▪ These represent the agent’s beliefs given the evidence ▪ Probabilities change with new evidence: ▪ P(on time | no accidents, 5 a.m) = 095 ▪ P(on time | no accidents, 5 a.m, raining) = 080 ▪ Observing new evidence causes beliefs to be updated Source: http://www.doksinet Inference by Enumeration ▪ General case: ▪ Evidence variables: ▪ Query* variable: ▪ Hidden variables: ▪ Step 1: Select the entries consistent with the evidence ▪ We want: * Works fine with multiple query variables, too All variables ▪ Step 2: Sum out H to get joint of Query and evidence ▪ Step 3: Normalize Source: http://www.doksinet Inference by Enumeration ▪ P(W)? ▪ P(W | winter)? ▪ P(W | winter, hot)? S T W P summe r hot sun 0.30 summe r hot

rain 0.05 summe r cold sun 0.10 summe r cold rain 0.05 winter hot sun 0.10 winter hot rain 0.05 winter cold sun 0.15 winter cold rain 0.20 Source: http://www.doksinet Inference by Enumeration ▪ Obvious problems: ▪ Worst-case time complexity O(dn) ▪ Space complexity O(dn) to store the joint distribution Source: http://www.doksinet The Product Rule ▪ Sometimes have conditional distributions but want the joint Source: http://www.doksinet The Product Rule ▪ Example: R P sun 0.8 rain 0.2 D W P D W P wet sun 0.1 wet sun 0.08 dry sun 0.9 dry sun 0.72 wet rain 0.7 wet rain 0.14 dry rain 0.3 dry rain 0.06 Source: http://www.doksinet The Chain Rule ▪ More generally, can always write any joint distribution as an incremental product of conditional distributions ▪ Why is this always true? Source: http://www.doksinet Bayes Rule Source: http://www.doksinet Bayes’ Rule ▪ Two ways to factor a joint

distribution over two variables: That’s my rule! ▪ Dividing, we get: ▪ Why is this at all helpful? ▪ Lets us build one conditional from its reverse ▪ Often one conditional is tricky but the other one is simple ▪ Foundation of many systems we’ll see later (e.g ASR, MT) ▪ In the running for most important AI equation! Source: http://www.doksinet Inference with Bayes’ Rule ▪ Example: Diagnostic probability from causal probability: ▪ Example: ▪ M: meningitis, S: stiff neck Example givens ▪ Note: posterior probability of meningitis still very small ▪ Note: you should still get stiff necks checked out! Why? Source: http://www.doksinet Quiz: Bayes’ Rule ▪ Given: R P sun 0.8 rain 0.2 ▪ What is P(W | dry) ? D W P wet sun 0.1 dry sun 0.9 wet rain 0.7 dry rain 0.3 Source: http://www.doksinet Ghostbusters, Revisited ▪ Let’s say we have two distributions: ▪ Prior distribution over ghost location: P(G) ▪ Let’s say this is

uniform ▪ Sensor reading model: P(R | G) ▪ Given: we know what our sensors do ▪ R = reading color measured at (1,1) ▪ E.g P(R = yellow | G=(1,1)) = 01 ▪ We can calculate the posterior distribution P(G|r) over ghost locations given a reading using Bayes’ rule: [Demo: Ghostbuster – with probability (L12D2) ] Source: http://www.doksinet Video of Demo Ghostbusters with Probability Source: http://www.doksinet Next Time: Markov Models Source: http://www.doksinet AI Outside of 188: Angry Birds Competition ▪ http://www.aibirdsorg Source: http://www.doksinet CS 188: Artificial Intelligence Markov Models Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Independence ▪ Two variables are independent in a joint distribution if: ▪ Says the joint

distribution factors into a product of two simple ones ▪ Usually variables aren’t independent! ▪ Can use independence as a modeling assumption ▪ Independence can be a simplifying assumption ▪ Empirical joint distributions: at best “close” to independent ▪ What could we assume for {Weather, Traffic, Cavity}? ▪ Independence is like something from CSPs: what? Source: http://www.doksinet Example: Independence? T P hot 0.5 cold 0.5 T W P T W P hot sun 0.4 hot sun 0.3 hot rain 0.1 hot rain 0.2 cold sun 0.2 cold sun 0.3 cold rain 0.3 cold rain 0.2 W P sun 0.6 rain 0.4 Source: http://www.doksinet Example: Independence ▪ N fair, independent coin flips: H 0.5 H 0.5 H 0.5 T 0.5 T 0.5 T 0.5 Source: http://www.doksinet Conditional Independence Source: http://www.doksinet Conditional Independence ▪ P(Toothache, Cavity, Catch) ▪ If I have a cavity, the probability that the probe catches in it doesnt depend on

whether I have a toothache: ▪ P(+catch | +toothache, +cavity) = P(+catch | +cavity) ▪ The same independence holds if I don’t have a cavity: ▪ P(+catch | +toothache, -cavity) = P(+catch| -cavity) ▪ Catch is conditionally independent of Toothache given Cavity: ▪ P(Catch | Toothache, Cavity) = P(Catch | Cavity) ▪ Equivalent statements: ▪ P(Toothache | Catch , Cavity) = P(Toothache | Cavity) ▪ P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) ▪ One can be derived from the other easily Source: http://www.doksinet Conditional Independence ▪ Unconditional (absolute) independence very rare (why?) ▪ Conditional independence is our most basic and robust form of knowledge about uncertain environments. ▪ X is conditionally independent of Y given Z if and only if: or, equivalently, if and only if Source: http://www.doksinet Conditional Independence ▪ What about this domain: ▪ Traffic ▪ Umbrella ▪ Raining Source: http://www.doksinet

Conditional Independence ▪ What about this domain: ▪ Fire ▪ Smoke ▪ Alarm Source: http://www.doksinet Probability Recap ▪ Conditional probability ▪ Product rule ▪ Chain rule ▪ X, Y independent if and only if: ▪ X and Y are conditionally independent given Z if and only if: Source: http://www.doksinet CS 188: Artificial Intelligence Markov Models Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Reasoning over Time or Space ▪ Often, we want to reason about a sequence of observations ▪ Speech recognition ▪ Robot localization ▪ User attention ▪ Medical monitoring ▪ Need to introduce time (or space) into our models Source: http://www.doksinet Markov Models ▪ Value of X at a given time is called the state X1 X2 X3 X4 ▪

Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial state probabilities) ▪ Stationarity assumption: transition probabilities the same at all times ▪ Same as MDP transition model, but no choice of action Source: http://www.doksinet Joint Distribution of a Markov Model X1 X2 X3 X4 ▪ Joint distribution: ▪ More generally: ▪ Questions to be resolved: ▪ Does this indeed define a joint distribution? ▪ Can every joint distribution be factored this way, or are we making some assumptions about the joint distribution by using this factorization? Source: http://www.doksinet Chain Rule and Markov Models X1 X2 X3 X4 ▪ From the chain rule, every joint distribution over ▪ Assuming that and results in the expression posited on the previous slide: can be written as: Source: http://www.doksinet Chain Rule and Markov Models X1 X2 X3 ▪ From the chain rule, every joint distribution over ▪ Assuming that

for all t: gives us the expression posited on the earlier slide: X4 can be written as: Source: http://www.doksinet Implied Conditional Independencies X1 ▪ We assumed: ▪ Do we also have ▪ Yes! ▪ Proof: X2 X3 X4 and ? Source: http://www.doksinet Markov Models Recap ▪ Explicit assumption for all t : ▪ Consequence, joint distribution can be written as: ▪ Implied conditional independencies: (try to prove this!) ▪ Past variables independent of future variables given the present i.e, if or then: ▪ Additional explicit assumption: is the same for all t Source: http://www.doksinet Example Markov Chain: Weather ▪ States: X = {rain, sun} ▪ Initial distribution: 1.0 sun ▪ CPT P(Xt | Xt-1): Xt-1 Xt P(Xt|Xt-1) sun sun 0.9 sun rain 0.1 rain sun 0.3 rain rain 0.7 Two new ways of representing the same CPT 0.9 0.3 rain sun sun rain 0.7 0.1 0.9 0.1 0.3 0.7 sun rain Source: http://www.doksinet Example Markov Chain: Weather ▪ Initial

distribution: 1.0 sun 0.9 0.3 rain 0.7 sun 0.1 ▪ What is the probability distribution after one step? Source: http://www.doksinet Mini-Forward Algorithm ▪ Question: What’s P(X) on some day t? X1 X2 X3 X4 Forward simulation Source: http://www.doksinet Example Run of Mini-Forward Algorithm ▪ From initial observation of sun P(X1) P(X2) P(X3) P(X4) P(X) P(X4) P(X) ▪ From initial observation of rain P(X1) P(X2) P(X3) ▪ From yet another initial distribution P(X1): P(X1) P(X) [Demo: L13D1,2,3] Source: http://www.doksinet Video of Demo Ghostbusters Basic Dynamics Source: http://www.doksinet Video of Demo Ghostbusters Circular Dynamics Source: http://www.doksinet Video of Demo Ghostbusters Whirlpool Dynamics Source: http://www.doksinet Stationary Distributions ▪ For most chains: ▪ Influence of the initial distribution gets less and less over time. ▪ The distribution we end up in is independent of the initial distribution

▪ Stationary distribution: ▪ The distribution we end up with is called the stationary distribution of the chain ▪ It satisfies Source: http://www.doksinet Example: Stationary Distributions ▪ Question: What’s P(X) at time t = infinity? X1 Also: X2 X3 X4 Xt-1 Xt P(Xt|Xt-1) sun sun 0.9 sun rain 0.1 rain sun 0.3 rain rain 0.7 Source: http://www.doksinet Application of Stationary Distribution: Web Link Analysis ▪ PageRank over a web graph ▪ Each web page is a state ▪ Initial distribution: uniform over pages ▪ Transitions: ▪ With prob. c, uniform jump to a random page (dotted lines, not all shown) ▪ With prob. 1-c, follow a random outlink (solid lines) ▪ Stationary distribution ▪ ▪ ▪ ▪ Will spend more time on highly reachable pages E.g many ways to get to the Acrobat Reader download page Somewhat robust to link spam Google 1.0 returned the set of pages containing all your keywords in decreasing rank, now all search engines use link

analysis along with many other factors (rank actually getting less important over time) Source: http://www.doksinet Next Time: Hidden Markov Models! Source: http://www.doksinet CS 188: Artificial Intelligence Hidden Markov Models Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Pacman – Sonar (P4) [Demo: Pacman – Sonar – No Beliefs(L14D1)] Source: http://www.doksinet Video of Demo Pacman – Sonar (no beliefs) Source: http://www.doksinet Probability Recap ▪ Conditional probability ▪ Product rule ▪ Chain rule ▪ X, Y independent if and only if: ▪ X and Y are conditionally independent given Z if and only if: Source: http://www.doksinet Hidden Markov Models Source: http://www.doksinet Hidden Markov Models ▪ Markov chains not so useful

for most agents ▪ Need observations to update your beliefs ▪ Hidden Markov models (HMMs) ▪ Underlying Markov chain over states X ▪ You observe outputs (effects) at each time step X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Source: http://www.doksinet Example: Weather HMM Raint-1 Raint Raint+1 Umbrellat-1 Umbrellat Umbrellat+1 ▪ An HMM is defined by: ▪ Initial distribution: ▪ Transitions: ▪ Emissions: Rt Rt+1 P(Rt+1|Rt) Rt Ut P(Ut|Rt) +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 -r +r 0.3 -r +u 0.2 -r -r 0.7 -r -u 0.8 Source: http://www.doksinet Example: Ghostbusters HMM ▪ P(X1) = uniform 1/9 1/9 1/9 1/9 1/9 1/9 ▪ P(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place 1/9 1/9 1/9 P(X1) ▪ P(Rij|X) = same sensor model as before: red means close, green means far away. X1 X2 X3 1/6 1/6 1/2 X4 X5 Ri,j Ri,j Ri,j 0 1/6 0 0 0 0 P(X|X’=<1,2>) Ri,j [Demo: Ghostbusters –

Circular Dynamics – HMM (L14D2)] Source: http://www.doksinet Video of Demo Ghostbusters – Circular Dynamics -- HMM Source: http://www.doksinet Joint Distribution of an HMM X1 X2 X3 X5 E1 E2 E3 E5 ▪ Joint distribution: ▪ More generally: ▪ Questions to be resolved: ▪ Does this indeed define a joint distribution? ▪ Can every joint distribution be factored this way, or are we making some assumptions about the joint distribution by using this factorization? Source: http://www.doksinet Chain Rule and HMMs ▪ From the chain rule, every joint distribution over ▪ Assuming that gives us the expression posited on the previous slide: X1 X2 X3 E1 E2 E3 can be written as: Source: http://www.doksinet Chain Rule and HMMs ▪ From the chain rule, every joint distribution over X1 X2 X3 E1 E2 E3 can be written as: ▪ Assuming that for all t: ▪ State independent of all past states and all past evidence given the previous state, i.e: ▪ Evidence

is independent of all past states and all past evidence given the current state, i.e: gives us the expression posited on the earlier slide: Source: http://www.doksinet Implied Conditional Independencies X1 X2 X3 E1 E2 E3 ▪ Many implied conditional independencies, e.g, ▪ To prove them ▪ Approach 1: follow similar (algebraic) approach to what we did in the Markov models lecture ▪ Approach 2: directly from the graph structure (3 lectures from now) ▪ Intuition: If path between U and V goes through W, then [Some fineprint later] Source: http://www.doksinet Real HMM Examples ▪ Speech recognition HMMs: ▪ Observations are acoustic signals (continuous valued) ▪ States are specific positions in specific words (so, tens of thousands) ▪ Machine translation HMMs: ▪ Observations are words (tens of thousands) ▪ States are translation options ▪ Robot tracking: ▪ Observations are range readings (continuous) ▪ States are positions on a map (continuous)

Source: http://www.doksinet Filtering / Monitoring ▪ Filtering, or monitoring, is the task of tracking the distribution Bt(X) = Pt(Xt | e1, , et) (the belief state) over time ▪ We start with B1(X) in an initial setting, usually uniform ▪ As time passes, or we get observations, we update B(X) ▪ The Kalman filter was invented in the 60’s and first implemented as a method of trajectory estimation for the Apollo program Source: http://www.doksinet Example: Robot Localization Example from Michael Pfeiffer Prob 0 1 t=0 Sensor model: can read in which directions there is a wall, never more than 1 mistake Motion model: may not execute action with small prob. Source: http://www.doksinet Example: Robot Localization Prob 0 1 t=1 Lighter grey: was possible to get the reading, but less likely b/c required 1 mistake Source: http://www.doksinet Example: Robot Localization Prob 0 1 t=2 Source: http://www.doksinet Example: Robot Localization Prob 0 1 t=3 Source:

http://www.doksinet Example: Robot Localization Prob 0 1 t=4 Source: http://www.doksinet Example: Robot Localization Prob 0 1 t=5 Source: http://www.doksinet Inference: Base Cases X1 X1 E1 X2 Source: http://www.doksinet Passage of Time ▪ Assume we have current belief P(X | evidence to date) X1 X2 ▪ Then, after one time step passes: ▪ Or compactly: ▪ Basic idea: beliefs get “pushed” through the transitions ▪ With the “B” notation, we have to be careful about what time step t the belief is about, and what evidence it includes Source: http://www.doksinet Example: Passage of Time ▪ As time passes, uncertainty “accumulates” T=1 T=2 (Transition model: ghosts usually go clockwise) T=5 Source: http://www.doksinet Observation ▪ Assume we have current belief P(X | previous evidence): X1 ▪ Then, after evidence comes in: E1 ▪ Or, compactly: ▪ Basic idea: beliefs “reweighted” by likelihood of evidence ▪ Unlike passage of

time, we have to renormalize Source: http://www.doksinet Example: Observation ▪ As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation Source: http://www.doksinet Example: Weather HMM B(+r) = 0.5 B(-r) = 0.5 Rain0 B’(+r) = 0.5 B’(-r) = 0.5 B’(+r) = 0.627 B’(-r) = 0.373 B(+r) = 0.818 B(-r) = 0.182 B(+r) = 0.883 B(-r) = 0.117 Rain1 Umbrella1 Rain2 Umbrella2 Rt Rt+1 P(Rt+1|Rt) Rt Ut P(Ut|Rt) +r +r 0.7 +r +u 0.9 +r -r 0.3 +r -u 0.1 -r +r 0.3 -r +u 0.2 -r -r 0.7 -r -u 0.8 Source: http://www.doksinet The Forward Algorithm ▪ We are given evidence at each time and want to know ▪ We can derive the following updates We can normalize as we go if we want to have P(x|e) at each time step, or just once at the end Source: http://www.doksinet Online Belief Updates ▪ Every time step, we start with current P(X | evidence) ▪ We update for time: X1 ▪ We update for evidence:

X2 X2 E2 ▪ The forward algorithm does both at once (and doesn’t normalize) Source: http://www.doksinet Pacman – Sonar (P4) [Demo: Pacman – Sonar – No Beliefs(L14D1)] Source: http://www.doksinet Video of Demo Pacman – Sonar (with beliefs) Source: http://www.doksinet Next Time: Particle Filtering and Applications of HMMs Source: http://www.doksinet CS 188: Artificial Intelligence Particle Filters and Applications of HMMs Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Today ▪ HMMs ▪ Particle filters ▪ Demo bonanza! ▪ Most-likely-explanation queries ▪ Applications: ▪ “I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis” ▪ Speech recognition Source: http://www.doksinet [Demo: Ghostbusters Markov

Model (L15D1)] Recap: Reasoning Over Time ▪ Markov models X1 0.3 X2 X3 0.7 X4 rain sun 0.7 0.3 ▪ Hidden Markov models X1 E1 X2 E2 X3 E3 X4 E4 X5 E5 X E P rain umbrella 0.9 rain no umbrella 0.1 sun umbrella 0.2 sun no umbrella 0.8 Source: http://www.doksinet Video of Demo Ghostbusters Markov Model (Reminder) Source: http://www.doksinet Recap: Filtering Elapse time: compute P( Xt | e1:t-1 ) Observe: compute P( Xt | e1:t ) Belief: <P(rain), P(sun)> X1 E1 X2 E2 <0.5, 05> Prior on X1 <0.82, 018> Observe <0.63, 037> Elapse time <0.88, 012> Observe [Demo: Ghostbusters Exact Filtering (L15D2)] Source: http://www.doksinet Video of Ghostbusters Exact Filtering (Reminder) Source: http://www.doksinet Particle Filtering Source: http://www.doksinet Particle Filtering ▪ Filtering: approximate solution 0.0 0.1 0.0 0.0 0.0 0.2 0.0 0.2 0.5 ▪ Sometimes |X| is too big to use exact inference ▪

|X| may be too big to even store B(X) ▪ E.g X is continuous ▪ Solution: approximate inference ▪ ▪ ▪ ▪ ▪ Track samples of X, not all values Samples are called particles Time per step is linear in the number of samples But: number needed may be large In memory: list of particles, not states ▪ This is how robot localization works in practice ▪ Particle is just new name for sample Source: http://www.doksinet Representation: Particles ▪ Our representation of P(X) is now a list of N particles (samples) ▪ Generally, N << |X| ▪ Storing map from X to counts would defeat the point ▪ P(x) approximated by number of particles with value x ▪ So, many x may have P(x) = 0! ▪ More particles, more accuracy ▪ For now, all particles have a weight of 1 Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Source: http://www.doksinet Particle Filtering: Elapse Time ▪ Each particle is moved by sampling its next position from the transition

model ▪ This is like prior sampling – samples’ frequencies reflect the transition probabilities ▪ Here, most samples move clockwise, but some move in another direction or stay in place ▪ This captures the passage of time ▪ If enough samples, close to exact values before and after (consistent) Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Source: http://www.doksinet Particle Filtering: Observe ▪ Slightly trickier: ▪ Don’t sample observation, fix it ▪ Similar to likelihood weighting, downweight samples based on the evidence ▪ As before, the probabilities don’t sum to one, since all have been downweighted (in fact they now sum to (N times) an approximation of P(e)) Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4

Source: http://www.doksinet Particle Filtering: Resample ▪ Rather than tracking weighted samples, we resample ▪ N times, we choose from our weighted sample distribution (i.e draw with replacement) Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 ▪ This is equivalent to renormalizing the distribution ▪ Now the update is complete for this time step, continue with the next one (New) Particles: (3,2) (2,2) (3,2) (2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2) Source: http://www.doksinet Recap: Particle Filtering ▪ Particles: track samples of states rather than an explicit distribution Elapse Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Weight Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Resample Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 (New) Particles: (3,2) (2,2) (3,2)

(2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2) [Demos: ghostbusters particle filtering (L15D3,4,5)] Source: http://www.doksinet Video of Demo – Moderate Number of Particles Source: http://www.doksinet Video of Demo – One Particle Source: http://www.doksinet Video of Demo – Huge Number of Particles Source: http://www.doksinet Robot Localization ▪ In robot localization: ▪ We know the map, but not the robot’s position ▪ Observations may be vectors of range finder readings ▪ State space and readings are typically continuous (works basically like a very fine grid) and so we cannot store B(X) ▪ Particle filtering is a main technique Source: http://www.doksinet Particle Filter Localization (Sonar) [Video: global-sonar-uw-annotated.avi] Source: http://www.doksinet Particle Filter Localization (Laser) [Video: global-floor.gif] Source: http://www.doksinet Robot Mapping ▪ SLAM: Simultaneous Localization And Mapping ▪ We do not know the map or our location

▪ State consists of position AND map! ▪ Main techniques: Kalman filtering (Gaussian HMMs) and particle methods DP-SLAM, Ron Parr [Demo: PARTICLES-SLAM-mapping1-new.avi] Source: http://www.doksinet Particle Filter SLAM – Video 1 [Demo: PARTICLES-SLAM-mapping1-new.avi] Source: http://www.doksinet Particle Filter SLAM – Video 2 [Demo: PARTICLES-SLAM-fastslam.avi] Source: http://www.doksinet Dynamic Bayes Nets Source: http://www.doksinet Dynamic Bayes Nets (DBNs) ▪ We want to track multiple variables over time, using multiple sources of evidence ▪ Idea: Repeat a fixed Bayes net structure at each time ▪ Variables from time t can condition on those from t-1 t =1 t =2 G 1a E1a G 3a G 2a G 1b E1b t =3 G 3b G 2b E2a E2b E3a E3b ▪ Dynamic Bayes nets are a generalization of HMMs [Demo: pacman sonar ghost DBN model (L15D6)] Source: http://www.doksinet Video of Demo Pacman Sonar Ghost DBN Model Source: http://www.doksinet DBN Particle Filters ▪ A

particle is a complete sample for a time step ▪ Initialize: Generate prior samples for the t=1 Bayes net ▪ Example particle: G1a = (3,3) G1b = (5,3) ▪ Elapse time: Sample a successor for each particle ▪ Example successor: G2a = (2,3) G2b = (6,3) ▪ Observe: Weight each entire sample by the likelihood of the evidence conditioned on the sample ▪ Likelihood: P(E1a |G1a ) * P(E1b |G1b ) ▪ Resample: Select prior samples (tuples of values) in proportion to their likelihood Source: http://www.doksinet Most Likely Explanation Source: http://www.doksinet HMMs: MLE Queries ▪ HMMs defined by ▪ ▪ ▪ ▪ ▪ States X Observations E Initial distribution: Transitions: Emissions: ▪ New query: most likely explanation: ▪ New method: the Viterbi algorithm X1 X2 X3 X4 X5 E1 E2 E3 E4 E5 Source: http://www.doksinet State Trellis ▪ State trellis: graph of states and transitions over time ▪ ▪ ▪ ▪ ▪ sun sun sun sun rain rain rain rain Each arc

represents some transition Each arc has weight Each path is a sequence of states The product of weights on a path is that sequence’s probability along with the evidence Forward algorithm computes sums of paths, Viterbi computes best paths Source: http://www.doksinet Forward / Viterbi Algorithms sun sun sun sun rain rain rain rain Forward Algorithm (Sum) Viterbi Algorithm (Max) Source: http://www.doksinet AI in the News I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis Brad Miller, Ling Huang, A. D Joseph, J D Tygar (UC Berkeley) Source: http://www.doksinet Challenge ▪ Setting ▪ User we want to spy on use HTTPS to browse the internet ▪ Measurements ▪ IP address ▪ Sizes of packets coming in ▪ Goal ▪ Infer browsing sequence of that user ▪ E.g: medical, financial, legal, Source: http://www.doksinet HMM ▪ Transition model ▪ Probability distribution over links on the current page + some probability to navigate

to any other page on the site ▪ Noisy observation model due to traffic variations ▪ Caching ▪ Dynamically generated content ▪ User-specific content, including cookies Probability distribution P( packet size | page ) Source: http://www.doksinet Results BoG = described approach, others are prior work Source: http://www.doksinet Results 0 20 Accuracy 40 60 80 100 Session Length Effect 0 10 20 30 40 50 60 Length of Browsing Session 70 Source: http://www.doksinet Speech Recognition Source: http://www.doksinet Speech Recognition in Action [Video: NLP – ASR tvsample.avi (from Lecture 1)] Source: http://www.doksinet Digitizing Speech Source: http://www.doksinet Speech in an Hour ▪ Speech input is an acoustic waveform s p ee ch l a b “l” to “a” transition: Figure: Simon Arnfield, http://www.psycleedsacuk/research/cogn/speech/tutorial/ Source: http://www.doksinet Spectral Analysis ▪ Frequency gives pitch; amplitude gives volume ▪

Sampling at ~8 kHz (phone), ~16 kHz (mic) (kHz=1000 cycles/sec) s p ee ch l a b ▪ Fourier transform of wave displayed as a spectrogram ▪ Darkness indicates energy at each frequency Human ear figure: depion.blogspotcom Source: http://www.doksinet Part of [ae] from “lab” ▪ Complex wave repeating nine times ▪ Plus smaller wave that repeats 4x for every large cycle ▪ Large wave: freq of 250 Hz (9 times in .036 seconds) ▪ Small wave roughly 4 times this, or roughly 1000 Hz Source: http://www.doksinet Why These Peaks? ▪ Articulator process: ▪ Vocal cord vibrations create harmonics ▪ The mouth is an amplifier ▪ Depending on shape of mouth, some harmonics are amplified more than others Source: http://www.doksinet Resonances of the Vocal Tract ▪ The human vocal tract as an open tube Open end Closed end Length 17.5 cm ▪ Air in a tube of a given length will tend to vibrate at resonance frequency of tube ▪ Constraint: Pressure differential should

be maximal at (closed) glottal end and minimal at (open) lip end Figure: W. Barry Speech Science slides Source: http://www.doksinet Spectrum Shapes Figure: Mark Liberman [Demo: speech synthesis ] Source: http://www.doksinet Video of Demo Speech Synthesis Source: http://www.doksinet Vowel [i] sung at successively higher pitches F#2 A2 C3 F#3 A3 C4 (middle C) A4 Graphs: Ratree Wayland Source: http://www.doksinet Acoustic Feature Sequence ▪ Time slices are translated into acoustic feature vectors (~39 real numbers per slice) .e12e13e14e15e16 ▪ These are the observations E, now we need the hidden states X Source: http://www.doksinet Speech State Space ▪ HMM Specification ▪ P(E|X) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound) ▪ P(X|X’) encodes how sounds can be strung together ▪ State Space ▪ We will have one state for each sound in each word ▪ Mostly, states advance sound by sound ▪ Build a little state

graph for each word and chain them together to form the state space X Source: http://www.doksinet States in a Word Source: http://www.doksinet Training Counts Transitions with a Bigram Model 198015222 the first 194623024 the same 168504105 the following 158562063 the world 14112454 the door ----------------------------------23135851162 the * Figure: Huang et al, p. 618 Source: http://www.doksinet Decoding ▪ Finding the words given the acoustics is an HMM inference problem ▪ Which state sequence x1:T is most likely given the evidence e1:T? ▪ From the sequence x, we can simply read off the words Source: http://www.doksinet Next Time: Bayes’ Nets! Source: http://www.doksinet CS 188: Artificial Intelligence Bayes’ Nets Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source:

http://www.doksinet Probabilistic Models ▪ Models describe how (a portion of) the world works ▪ Models are always simplifications ▪ May not account for every variable ▪ May not account for all interactions between variables ▪ “All models are wrong; but some are useful.” – George E. P Box ▪ What do we do with probabilistic models? ▪ We (or our agents) need to reason about unknown variables, given evidence ▪ Example: explanation (diagnostic reasoning) ▪ Example: prediction (causal reasoning) ▪ Example: value of information Source: http://www.doksinet Independence Source: http://www.doksinet Independence ▪ Two variables are independent if: ▪ This says that their joint distribution factors into a product two simpler distributions ▪ Another form: ▪ We write: ▪ Independence is a simplifying modeling assumption ▪ Empirical joint distributions: at best “close” to independent ▪ What could we assume for {Weather, Traffic, Cavity,

Toothache}? Source: http://www.doksinet Example: Independence? T P hot 0.5 cold 0.5 T W P T W P hot sun 0.4 hot sun 0.3 hot rain 0.1 hot rain 0.2 cold sun 0.2 cold sun 0.3 cold rain 0.3 cold rain 0.2 W P sun 0.6 rain 0.4 Source: http://www.doksinet Example: Independence ▪ N fair, independent coin flips: H 0.5 H 0.5 H 0.5 T 0.5 T 0.5 T 0.5 Source: http://www.doksinet Conditional Independence ▪ P(Toothache, Cavity, Catch) ▪ If I have a cavity, the probability that the probe catches in it doesnt depend on whether I have a toothache: ▪ P(+catch | +toothache, +cavity) = P(+catch | +cavity) ▪ The same independence holds if I don’t have a cavity: ▪ P(+catch | +toothache, -cavity) = P(+catch| -cavity) ▪ Catch is conditionally independent of Toothache given Cavity: ▪ P(Catch | Toothache, Cavity) = P(Catch | Cavity) ▪ Equivalent statements: ▪ P(Toothache | Catch , Cavity) = P(Toothache | Cavity) ▪

P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) ▪ One can be derived from the other easily Source: http://www.doksinet Conditional Independence ▪ Unconditional (absolute) independence very rare (why?) ▪ Conditional independence is our most basic and robust form of knowledge about uncertain environments. ▪ X is conditionally independent of Y given Z if and only if: or, equivalently, if and only if Source: http://www.doksinet Conditional Independence ▪ What about this domain: ▪ Traffic ▪ Umbrella ▪ Raining Source: http://www.doksinet Conditional Independence ▪ What about this domain: ▪ Fire ▪ Smoke ▪ Alarm Source: http://www.doksinet Conditional Independence and the Chain Rule ▪ Chain rule: ▪ Trivial decomposition: ▪ With assumption of conditional independence: ▪ Bayes’nets / graphical models help us express conditional independence assumptions Source: http://www.doksinet Ghostbusters Chain Rule ▪ Each sensor

depends only on where the ghost is ▪ That means, the two sensors are conditionally independent, given the ghost position ▪ T: Top square is red B: Bottom square is red G: Ghost is in the top ▪ Givens: P( +g ) = 0.5 P( -g ) = 0.5 P( +t | +g ) = 0.8 P( +t | -g ) = 0.4 P( +b | +g ) = 0.4 P( +b | -g ) = 0.8 P(T,B,G) = P(G) P(T|G) P(B|G) T B G P(T,B,G) +t +b +g 0.16 +t +b -g 0.16 +t -b +g 0.24 +t -b -g 0.04 -t +b +g 0.04 -t +b -g 0.24 -t -b +g 0.06 -t -b -g 0.06 Source: http://www.doksinet Bayes’Nets: Big Picture Source: http://www.doksinet Bayes’ Nets: Big Picture ▪ Two problems with using full joint distribution tables as our probabilistic models: ▪ Unless there are only a few variables, the joint is WAY too big to represent explicitly ▪ Hard to learn (estimate) anything empirically about more than a few variables at a time ▪ Bayes’ nets: a technique for describing complex joint distributions (models) using simple, local

distributions (conditional probabilities) ▪ More properly called graphical models ▪ We describe how variables locally interact ▪ Local interactions chain together to give global, indirect interactions ▪ For about 10 min, we’ll be vague about how these interactions are specified Source: http://www.doksinet Example Bayes’ Net: Insurance Source: http://www.doksinet Example Bayes’ Net: Car Source: http://www.doksinet Graphical Model Notation ▪ Nodes: variables (with domains) ▪ Can be assigned (observed) or unassigned (unobserved) ▪ Arcs: interactions ▪ Similar to CSP constraints ▪ Indicate “direct influence” between variables ▪ Formally: encode conditional independence (more later) ▪ For now: imagine that arrows mean direct causation (in general, they don’t!) Source: http://www.doksinet Example: Coin Flips ▪ N independent coin flips X1 X2 Xn ▪ No interactions between variables: absolute independence Source: http://www.doksinet

Example: Traffic ▪ Variables: ▪ R: It rains ▪ T: There is traffic ▪ Model 1: independence ▪ Model 2: rain causes traffic R R T T ▪ Why is an agent using model 2 better? Source: http://www.doksinet Example: Traffic II ▪ Let’s build a causal graphical model! ▪ Variables ▪ ▪ ▪ ▪ ▪ ▪ T: Traffic R: It rains L: Low pressure D: Roof drips B: Ballgame C: Cavity Source: http://www.doksinet Example: Alarm Network ▪ Variables ▪ ▪ ▪ ▪ ▪ B: Burglary A: Alarm goes off M: Mary calls J: John calls E: Earthquake! Source: http://www.doksinet Bayes’ Net Semantics Source: http://www.doksinet Bayes’ Net Semantics ▪ A set of nodes, one per variable X ▪ A directed, acyclic graph A1 An ▪ A conditional distribution for each node ▪ A collection of distributions over X, one for each combination of parents’ values X ▪ CPT: conditional probability table ▪ Description of a noisy “causal” process A Bayes net = Topology (graph)

+ Local Conditional Probabilities Source: http://www.doksinet Probabilities in BNs ▪ Bayes’ nets implicitly encode joint distributions ▪ As a product of local conditional distributions ▪ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: ▪ Example: Source: http://www.doksinet Probabilities in BNs ▪ Why are we guaranteed that setting results in a proper joint distribution? ▪ Chain rule (valid for all distributions): ▪ Assume conditional independences: Consequence: ▪ Not every BN can represent every joint distribution ▪ The topology enforces certain conditional independencies Source: http://www.doksinet Example: Coin Flips X1 X2 Xn h 0.5 h 0.5 h 0.5 t 0.5 t 0.5 t 0.5 Only distributions whose variables are absolutely independent can be represented by a Bayes’ net with no arcs. Source: http://www.doksinet Example: Traffic R +r T -r +r 1/4 -r 3/4 +t 3/4 -t 1/4 +t 1/2 -t

1/2 Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 Burglary Earthqk E P(E) +e 0.002 -e 0.998 Alarm John calls Mary calls B E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 A J P(J|A) A M P(M|A) +b -e -a 0.06 +a +j 0.9 +a +m 0.7 -b +e +a 0.29 +a -j 0.1 +a -m 0.3 -b +e -a 0.71 -a +j 0.05 -a +m 0.01 -b -e +a 0.001 -a -j 0.95 -a -m 0.99 -b -e -a 0.999 Source: http://www.doksinet Example: Traffic ▪ Causal direction R +r T -r +r 1/4 -r 3/4 +t 3/4 -t 1/4 +t 1/2 -t 1/2 +r +t 3/16 +r -t 1/16 -r +t 6/16 -r -t 6/16 Source: http://www.doksinet Example: Reverse Traffic ▪ Reverse causality? T +t R -t +t 9/16 -t 7/16 +r 1/3 -r 2/3 +r 1/7 -r 6/7 +r +t 3/16 +r -t 1/16 -r +t 6/16 -r -t 6/16 Source: http://www.doksinet Causality? ▪ When Bayes’ nets reflect the true causal patterns: ▪ Often simpler (nodes have

fewer parents) ▪ Often easier to think about ▪ Often easier to elicit from experts ▪ BNs need not actually be causal ▪ Sometimes no causal net exists over the domain (especially if variables are missing) ▪ E.g consider the variables Traffic and Drips ▪ End up with arrows that reflect correlation, not causation ▪ What do the arrows really mean? ▪ Topology may happen to encode causal structure ▪ Topology really encodes conditional independence Source: http://www.doksinet Bayes’ Nets ▪ So far: how a Bayes’ net encodes a joint distribution ▪ Next: how to answer queries about that distribution ▪ Today: ▪ First assembled BNs using an intuitive notion of conditional independence as causality ▪ Then saw that key property is conditional independence ▪ Main goal: answer queries about conditional independence and influence ▪ After that: how to answer numerical queries (inference) Source: http://www.doksinet CS 188: Artificial Intelligence Bayes’

Nets: Independence Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Probability Recap ▪ Conditional probability ▪ Product rule ▪ Chain rule ▪ X, Y independent if and only if: ▪ X and Y are conditionally independent given Z if and only if: Source: http://www.doksinet Bayes’ Nets ▪ A Bayes’ net is an efficient encoding of a probabilistic model of a domain ▪ Questions we can ask: ▪ Inference: given a fixed BN, what is P(X | e)? ▪ Representation: given a BN graph, what kinds of distributions can it encode? ▪ Modeling: what BN is most appropriate for a given domain? Source: http://www.doksinet Bayes’ Net Semantics ▪ A directed, acyclic graph, one node per random variable ▪ A conditional probability table (CPT) for each node ▪ A

collection of distributions over X, one for each combination of parents’ values ▪ Bayes’ nets implicitly encode joint distributions ▪ As a product of local conditional distributions ▪ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 A J P(J|A) +a +j +a B E A E P(E) +e 0.002 -e 0.998 A M P(M|A) 0.9 +a +m 0.7 -j 0.1 +a -m 0.3 -a +j 0.05 -a +m 0.01 -a -j 0.95 -a -m 0.99 J M B E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 A J P(J|A) +a +j +a B E A E P(E) +e 0.002 -e 0.998 A M P(M|A) 0.9 +a +m 0.7 -j 0.1 +a -m 0.3 -a +j 0.05 -a +m 0.01 -a -j 0.95 -a -m 0.99 J M B

E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 Source: http://www.doksinet Size of a Bayes’ Net ▪ How big is a joint distribution over N Boolean variables? 2N ▪ How big is an N-node net if nodes have up to k parents? O(N * 2k+1) ▪ Both give you the power to calculate ▪ BNs: Huge space savings! ▪ Also easier to elicit local CPTs ▪ Also faster to answer queries (coming) Source: http://www.doksinet Bayes’ Nets ▪ Representation ▪ Conditional Independences ▪ Probabilistic Inference ▪ Learning Bayes’ Nets from Data Source: http://www.doksinet Conditional Independence ▪ X and Y are independent if ▪ X and Y are conditionally independent given Z ▪ (Conditional) independence is a property of a distribution ▪ Example: Source: http://www.doksinet Bayes Nets: Assumptions ▪ Assumptions we are required to make to define the Bayes net

when given the graph: ▪ Beyond above “chain rule Bayes net” conditional independence assumptions ▪ Often additional conditional independences ▪ They can be read off the graph ▪ Important for modeling: understand assumptions made when choosing a Bayes net graph Source: http://www.doksinet Example X Y Z W ▪ Conditional independence assumptions directly from simplifications in chain rule: ▪ Additional implied conditional independence assumptions? Source: http://www.doksinet Independence in a BN ▪ Important question about a BN: ▪ ▪ ▪ ▪ Are two nodes independent given certain evidence? If yes, can prove using algebra (tedious in general) If no, can prove with a counter example Example: X Y Z ▪ Question: are X and Z necessarily independent? ▪ Answer: no. Example: low pressure causes rain, which causes traffic ▪ X can influence Z, Z can influence X (via Y) ▪ Addendum: they could be independent: how? Source: http://www.doksinet

D-separation: Outline Source: http://www.doksinet D-separation: Outline ▪ Study independence properties for triples ▪ Analyze complex cases in terms of member triples ▪ D-separation: a condition / algorithm for answering such queries Source: http://www.doksinet Causal Chains ▪ This configuration is a “causal chain” ▪ Guaranteed X independent of Z ? No! ▪ One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed. ▪ Example: ▪ Low pressure causes rain causes traffic, high pressure causes no rain causes no traffic X: Low pressure Y: Rain Z: Traffic ▪ In numbers: P( +y | +x ) = 1, P( -y | - x ) = 1, P( +z | +y ) = 1, P( -z | -y ) = 1 Source: http://www.doksinet Causal Chains ▪ This configuration is a “causal chain” X: Low pressure Y: Rain ▪ Guaranteed X independent of Z given Y? Z: Traffic Yes! ▪ Evidence along the chain “blocks” the influence Source: http://www.doksinet

Common Cause ▪ This configuration is a “common cause” ▪ Guaranteed X independent of Z ? No! ▪ One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed. Y: Project due ▪ Example: ▪ Project due causes both forums busy and lab full ▪ In numbers: X: Forums busy Z: Lab full P( +x | +y ) = 1, P( -x | -y ) = 1, P( +z | +y ) = 1, P( -z | -y ) = 1 Source: http://www.doksinet Common Cause ▪ This configuration is a “common cause” ▪ Guaranteed X and Z independent given Y? Y: Project due X: Forums busy Z: Lab full Yes! ▪ Observing the cause blocks influence between effects. Source: http://www.doksinet Common Effect ▪ Last configuration: two causes of one effect (v-structures) X: Raining Y: Ballgame ▪ Are X and Y independent? ▪ Yes: the ballgame and the rain cause traffic, but they are not correlated ▪ Still need to prove they must be (try it!) ▪ Are X and Y independent given Z? ▪

No: seeing traffic puts the rain and the ballgame in competition as explanation. ▪ This is backwards from the other cases Z: Traffic ▪ Observing an effect activates influence between possible causes. Source: http://www.doksinet The General Case Source: http://www.doksinet The General Case ▪ General question: in a given BN, are two variables independent (given evidence)? ▪ Solution: analyze the graph ▪ Any complex example can be broken into repetitions of the three canonical cases Source: http://www.doksinet Reachability ▪ Recipe: shade evidence nodes, look for paths in the resulting graph L ▪ Attempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are conditionally independent R ▪ Almost works, but not quite ▪ Where does it break? ▪ Answer: the v-structure at T doesn’t count as a link in a path unless “active” D B T Source: http://www.doksinet Active / Inactive Paths ▪ Question: Are X and Y

conditionally independent given evidence variables {Z}? ▪ Yes, if X and Y “d-separated” by Z ▪ Consider all (undirected) paths from X to Y ▪ No active paths = independence! ▪ A path is active if each triple is active: ▪ Causal chain A B C where B is unobserved (either direction) ▪ Common cause A  B C where B is unobserved ▪ Common effect (aka v-structure) A B  C where B or one of its descendents is observed ▪ All it takes to block a path is a single inactive segment Active Triples Inactive Triples Source: http://www.doksinet D-Separation ▪ Query: ▪ Check all (undirected!) paths between ? and ▪ If one or more active, then independence not guaranteed ▪ Otherwise (i.e if all paths are inactive), then independence is guaranteed Source: http://www.doksinet Example Yes R B T T’ Source: http://www.doksinet Example L Yes R Yes D B T Yes T’ Source: http://www.doksinet Example ▪ Variables: ▪ ▪ ▪ ▪ R: Raining T:

Traffic D: Roof drips S: I’m sad R T D ▪ Questions: S Yes Source: http://www.doksinet Structure Implications ▪ Given a Bayes net structure, can run dseparation algorithm to build a complete list of conditional independences that are necessarily true of the form ▪ This list determines the set of probability distributions that can be represented Source: http://www.doksinet Computing All Independences Y X Z Y X Z X Z Y Y X Z Source: http://www.doksinet Topology Limits Distributions ▪ Given some graph topology G, only certain joint distributions can be encoded Y Y X X Z Y ▪ The graph structure guarantees certain (conditional) independences X ▪ (There might be more independence) X ▪ Adding arcs increases the set of distributions, but has several costs ▪ Full conditioning can encode any distribution Z Z Y Y Y X Z X X Y Z X Y Y Z X Z Z Y Z X Z Source: http://www.doksinet Bayes Nets Representation Summary ▪ Bayes nets

compactly encode joint distributions ▪ Guaranteed independencies of distributions can be deduced from BN graph structure ▪ D-separation gives precise conditional independence guarantees from graph alone ▪ A Bayes’ net’s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution Source: http://www.doksinet Bayes’ Nets ▪ Representation ▪ Conditional Independences ▪ Probabilistic Inference ▪ Enumeration (exact, exponential complexity) ▪ Variable elimination (exact, worst-case exponential complexity, often better) ▪ Probabilistic inference is NP-complete ▪ Sampling (approximate) ▪ Learning Bayes’ Nets from Data Source: http://www.doksinet Have a great Spring Break! Source: http://www.doksinet CS 188: Artificial Intelligence Bayes’ Nets: Inference Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter

Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Bayes’ Net Representation ▪ A directed, acyclic graph, one node per random variable ▪ A conditional probability table (CPT) for each node ▪ A collection of distributions over X, one for each combination of parents’ values ▪ Bayes’ nets implicitly encode joint distributions ▪ As a product of local conditional distributions ▪ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 Burglary Earthqk E P(E) +e 0.002 -e 0.998 Alarm John calls Mary calls B E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 A J P(J|A) A M P(M|A) +b -e -a 0.06 +a +j 0.9 +a +m 0.7 -b +e +a 0.29 +a -j 0.1 +a -m 0.3 -b +e -a 0.71 -a +j 0.05 -a +m 0.01 -b -e +a

0.001 -a -j 0.95 -a -m 0.99 -b -e -a 0.999 [Demo: BN Applet] Source: http://www.doksinet Video of Demo BN Applet Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 A J P(J|A) +a +j +a B E A E P(E) +e 0.002 -e 0.998 A M P(M|A) 0.9 +a +m 0.7 -j 0.1 +a -m 0.3 -a +j 0.05 -a +m 0.01 -a -j 0.95 -a -m 0.99 J M B E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 Source: http://www.doksinet Example: Alarm Network B P(B) +b 0.001 -b 0.999 A J P(J|A) +a +j +a B E A E P(E) +e 0.002 -e 0.998 A M P(M|A) 0.9 +a +m 0.7 -j 0.1 +a -m 0.3 -a +j 0.05 -a +m 0.01 -a -j 0.95 -a -m 0.99 J M B E A P(A|B,E) +b +e +a 0.95 +b +e -a 0.05 +b -e +a 0.94 +b -e -a 0.06 -b +e +a 0.29 -b +e -a 0.71 -b -e +a 0.001 -b -e -a 0.999 Source:

http://www.doksinet Bayes’ Nets ▪ Representation ▪ Conditional Independences ▪ Probabilistic Inference ▪ Enumeration (exact, exponential complexity) ▪ Variable elimination (exact, worst-case exponential complexity, often better) ▪ Inference is NP-complete ▪ Sampling (approximate) ▪ Learning Bayes’ Nets from Data Source: http://www.doksinet Inference ▪ Inference: calculating some useful quantity from a joint probability distribution ▪ Examples: ▪ Posterior probability ▪ Most likely explanation: Source: http://www.doksinet Inference by Enumeration ▪ General case: ▪ Evidence variables: ▪ Query* variable: ▪ Hidden variables: ▪ Step 1: Select the entries consistent with the evidence ▪ We want: * Works fine with multiple query variables, too All variables ▪ Step 2: Sum out H to get joint of Query and evidence ▪ Step 3: Normalize Source: http://www.doksinet Inference by Enumeration in Bayes’ Net ▪ Given unlimited time,

inference in BNs is easy ▪ Reminder of inference by enumeration by example: B E A J M Source: http://www.doksinet Inference by Enumeration? Source: http://www.doksinet Inference by Enumeration vs. Variable Elimination ▪ Why is inference by enumeration so slow? ▪ You join up the whole joint distribution before you sum out the hidden variables ▪ Idea: interleave joining and marginalizing! ▪ Called “Variable Elimination” ▪ Still NP-hard, but usually much faster than inference by enumeration ▪ First we’ll need some new notation: factors Source: http://www.doksinet Factor Zoo Source: http://www.doksinet Factor Zoo I ▪ Joint distribution: P(X,Y) ▪ Entries P(x,y) for all x, y ▪ Sums to 1 T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T W P cold sun 0.2 cold rain 0.3 ▪ Selected joint: P(x,Y) ▪ A slice of the joint distribution ▪ Entries P(x,y) for fixed x, all y ▪ Sums to P(x) ▪ Number of capitals =

dimensionality of the table Source: http://www.doksinet Factor Zoo II ▪ Single conditional: P(Y | x) ▪ Entries P(y | x) for fixed x, all y ▪ Sums to 1 ▪ Family of conditionals: P(X |Y) ▪ Multiple conditionals ▪ Entries P(x | y) for all x, y ▪ Sums to |Y| T W P cold sun 0.4 cold rain 0.6 T W P hot sun 0.8 hot rain 0.2 cold sun 0.4 cold rain 0.6 Source: http://www.doksinet Factor Zoo III ▪ Specified family: P( y | X ) ▪ Entries P(y | x) for fixed y, but for all x ▪ Sums to who knows! T W P hot rain 0.2 cold rain 0.6 Source: http://www.doksinet Factor Zoo Summary ▪ In general, when we write P(Y1 YN | X1 XM) ▪ It is a “factor,” a multi-dimensional array ▪ Its values are P(y1 yN | x1 xM) ▪ Any assigned (=lower-case) X or Y is a dimension missing (selected) from the array Source: http://www.doksinet Example: Traffic Domain ▪ Random Variables ▪ R: Raining ▪ T: Traffic ▪ L: Late for class! +r -r R T

0.1 0.9 +r +r -r -r +t -t +t -t 0.8 0.2 0.1 0.9 +t +t -t -t +l -l +l -l 0.3 0.7 0.1 0.9 L Source: http://www.doksinet Inference by Enumeration: Procedural Outline ▪ Track objects called factors ▪ Initial factors are local CPTs (one per node) +r -r 0.1 0.9 +r +r -r -r +t -t +t -t 0.8 0.2 0.1 0.9 +t +t -t -t +l -l +l -l 0.3 0.7 0.1 0.9 ▪ Any known values are selected ▪ E.g if we know +r -r 0.1 0.9 , the initial factors are +r +r -r -r +t -t +t -t 0.8 0.2 0.1 0.9 +t -t +l +l 0.3 0.1 ▪ Procedure: Join all factors, then eliminate all hidden variables Source: http://www.doksinet Operation 1: Join Factors ▪ First basic operation: joining factors ▪ Combining factors: ▪ Just like a database join ▪ Get all factors over the joining variable ▪ Build a new factor over the union of the variables involved ▪ Example: Join on R R +r -r T 0.1 0.9 +r +r -r -r ▪ Computation for each entry: pointwise products +t -t +t -t 0.8 0.2 0.1 0.9 +r +r -r

-r +t -t +t -t 0.08 0.02 0.09 0.81 R,T Source: http://www.doksinet Example: Multiple Joins Source: http://www.doksinet Example: Multiple Joins R T L +r -r +r +r -r -r 0.1 0.9 +t -t +t -t 0.8 0.2 0.1 0.9 Join R +r +t 0.08 +r -t 0.02 -r +t 0.09 -r -t 0.81 Join T R, T L +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 R, T, L +r +r +r +r -r -r -r -r +t +t -t -t +t +t -t -t +l -l +l -l +l -l +l -l 0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729 Source: http://www.doksinet Operation 2: Eliminate ▪ Second basic operation: marginalization ▪ Take a factor and sum out a variable ▪ Shrinks a factor to a smaller one ▪ A projection operation ▪ Example: +r +t 0.08 +r -t 0.02 -r +t 0.09 -r -t 0.81 +t -t 0.17 0.83 Source: http://www.doksinet Multiple Elimination R, T, L +r +r +r +r -r -r -r -r +t +t -t -t +t +t -t -t +l -l +l -l +l -l +l -l T, L 0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729 L Sum out T Sum out R +t

+l 0.051 +t -l 0.119 -t +l 0.083 -t -l 0.747 +l -l 0.134 0.886 Source: http://www.doksinet Thus Far: Multiple Join, Multiple Eliminate (= Inference by Enumeration) Source: http://www.doksinet Marginalizing Early (= Variable Elimination) Source: http://www.doksinet Traffic Domain R T ▪ Inference by Enumeration ▪ Variable Elimination L Join on r Join on r Join on t Eliminate r Eliminate t Eliminate r Join on t Eliminate t Source: http://www.doksinet Marginalizing Early! (aka VE) Join R +r -r R T L +r +r -r -r 0.1 0.9 +t -t +t -t 0.8 0.2 0.1 0.9 +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 Sum out R +r +t 0.08 +r -t 0.02 -r +t 0.09 -r -t 0.81 +t -t 0.17 0.83 R, T T L L +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 Sum out T Join T +t +l 0.3 +t -l 0.7 -t +l 0.1 -t -l 0.9 T, L +t +l 0.051 +t -l 0.119 -t +l 0.083 -t -l 0.747 L +l -l 0.134 0.866 Source: http://www.doksinet Evidence ▪ If evidence, start with factors that select that evidence ▪ No

evidence uses these initial factors: +r -r 0.1 0.9 +r +r -r -r ▪ Computing +r 0.1 +t -t +t -t 0.8 0.2 0.1 0.9 +t +t -t -t +l -l +l -l 0.3 0.7 0.1 0.9 , the initial factors become: +r +r +t -t 0.8 0.2 +t +t -t -t +l -l +l -l 0.3 0.7 0.1 0.9 ▪ We eliminate all vars other than query + evidence Source: http://www.doksinet Evidence II ▪ Result will be a selected joint of query and evidence ▪ E.g for P(L | +r), we would end up with: Normalize +r +l +r -l 0.026 0.074 ▪ To get our answer, just normalize this! ▪ That ’s it! +l 0.26 -l 0.74 Source: http://www.doksinet General Variable Elimination ▪ Query: ▪ Start with initial factors: ▪ Local CPTs (but instantiated by evidence) ▪ While there are still hidden variables (not Q or evidence): ▪ Pick a hidden variable H ▪ Join all factors mentioning H ▪ Eliminate (sum out) H ▪ Join all remaining factors and normalize Source: http://www.doksinet Example Choose A Source:

http://www.doksinet Example Choose E Finish with B Normalize Source: http://www.doksinet Same Example in Equations marginal can be obtained from joint by summing out use Bayes’ net joint distribution expression use x*(y+z) = xy + xz joining on a, and then summing out gives f1 use x*(y+z) = xy + xz joining on e, and then summing out gives f2 All we are doing is exploiting uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computational efficiency! Source: http://www.doksinet Another Variable Elimination Example Computational complexity critically depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 --- as they all only have one variable (Z, Z, and X3 respectively). Source: http://www.doksinet Variable Elimination Ordering ▪ For the query P(Xn|y1,,yn) work through the following two different orderings as done in previous

slide: Z, X1, , Xn-1 and X1, , Xn-1, Z. What is the size of the maximum factor generated for each of the orderings? ▪ Answer: 2n+1 versus 22 (assuming binary) ▪ In general: the ordering can greatly affect efficiency. Source: http://www.doksinet VE: Computational and Space Complexity ▪ The computational and space complexity of variable elimination is determined by the largest factor ▪ The elimination ordering can greatly affect the size of the largest factor. ▪ E.g, previous slide’s example 2n vs 2 ▪ Does there always exist an ordering that only results in small factors? ▪ No! Source: http://www.doksinet Worst Case Complexity? ▪ CSP: ▪ If we can answer P(z) equal to zero or not, we answered whether the 3-SAT problem has a solution. ▪ Hence inference in Bayes’ nets is NP-hard. No known efficient probabilistic inference in general Source: http://www.doksinet Polytrees ▪ A polytree is a directed graph with no undirected cycles ▪ For

poly-trees you can always find an ordering that is efficient ▪ Try it!! ▪ Cut-set conditioning for Bayes’ net inference ▪ Choose set of variables such that if removed only a polytree remains ▪ Exercise: Think about how the specifics would work out! Source: http://www.doksinet Bayes’ Nets ▪ Representation ▪ Conditional Independences ▪ Probabilistic Inference ▪ Enumeration (exact, exponential complexity) ▪ Variable elimination (exact, worst-case exponential complexity, often better) ▪ Inference is NP-complete ▪ Sampling (approximate) ▪ Learning Bayes’ Nets from Data Source: http://www.doksinet CS 188: Artificial Intelligence Bayes’ Nets: Sampling Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Bayes’ Net Representation ▪ A

directed, acyclic graph, one node per random variable ▪ A conditional probability table (CPT) for each node ▪ A collection of distributions over X, one for each combination of parents’ values ▪ Bayes’ nets implicitly encode joint distributions ▪ As a product of local conditional distributions ▪ To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: Source: http://www.doksinet Variable Elimination ▪ Interleave joining and marginalizing ▪ dk entries computed for a factor over k variables with domain sizes d ▪ Ordering of elimination of hidden variables can affect size of factors generated ▪ Worst case: running time exponential in the size of the Bayes’ net Source: http://www.doksinet Approximate Inference: Sampling Source: http://www.doksinet Sampling ▪ Sampling is a lot like repeated simulation ▪ Predicting the weather, basketball games, ▪ Basic idea ▪ Draw N samples from a sampling

distribution S ▪ Compute an approximate posterior probability ▪ Show this converges to the true probability P ▪ Why sample? ▪ Learning: get samples from a distribution you don’t know ▪ Inference: getting a sample is faster than computing the right answer (e.g with variable elimination) Source: http://www.doksinet Sampling ▪ Sampling from given distribution ▪ Step 1: Get sample u from uniform distribution over [0, 1) ▪ E.g random() in python ▪ Step 2: Convert this sample u into an outcome for the given distribution by having each outcome associated with a sub-interval of [0,1) with sub-interval size equal to probability of the outcome ▪ Example C red green P(C) 0.6 0.1 blue 0.3 ▪ If random() returns u = 0.83, then our sample is C = blue ▪ E.g, after sampling 8 times: Source: http://www.doksinet Sampling in Bayes’ Nets ▪ Prior Sampling ▪ Rejection Sampling ▪ Likelihood Weighting ▪ Gibbs Sampling Source: http://www.doksinet Prior

Sampling Source: http://www.doksinet Prior Sampling +c -c 0.5 0.5 Cloudy +c -c +s +s -s +s -s 0.1 0.9 0.5 0.5 +r -r -s +r -r +c Sprinkler +w -w +w -w +w -w +w -w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 -c Rain WetGrass +r -r +r -r 0.8 0.2 0.2 0.8 Samples: +c, -s, +r, +w -c, +s, -r, +w Source: http://www.doksinet Prior Sampling ▪ For i=1, 2, , n ▪ Sample xi from P(Xi | Parents(Xi)) ▪ Return (x1, x2, , xn) Source: http://www.doksinet Prior Sampling ▪ This process generates samples with probability: i.e the BN’s joint probability ▪ Let the number of samples of an event be ▪ Then ▪ I.e, the sampling procedure is consistent Source: http://www.doksinet Example ▪ We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w C S ▪ If we want to know P(W) ▪ ▪ ▪ ▪ ▪ ▪ We have counts <+w:4, -w:1> Normalize to get P(W) = <+w:0.8, -w:02> This will get closer to

the true distribution with more samples Can estimate anything else, too What about P(C| +w)? P(C| +r, +w)? P(C| -r, -w)? Fast: can use fewer samples if less time (what’s the drawback?) R W Source: http://www.doksinet Rejection Sampling Source: http://www.doksinet Rejection Sampling ▪ Let’s say we want P(C) ▪ No point keeping all samples around ▪ Just tally counts of C as we go C ▪ Let’s say we want P(C| +s) ▪ Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s ▪ This is called rejection sampling ▪ It is also consistent for conditional probabilities (i.e, correct in the limit) S R W +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w Source: http://www.doksinet Rejection Sampling ▪ IN: evidence instantiation ▪ For i=1, 2, , n ▪ Sample xi from P(Xi | Parents(Xi)) ▪ If xi not consistent with evidence ▪ Reject: Return, and no sample is generated in this cycle ▪ Return (x1, x2, , xn)

Source: http://www.doksinet Likelihood Weighting Source: http://www.doksinet Likelihood Weighting ▪ Problem with rejection sampling: ▪ If evidence is unlikely, rejects lots of samples ▪ Evidence not exploited as you sample ▪ Consider P(Shape|blue) Shape Color pyramid, pyramid, sphere, cube, sphere, green red blue red green ▪ Idea: fix evidence variables and sample the rest ▪ Problem: sample distribution not consistent! ▪ Solution: weight by probability of evidence given parents pyramid, blue pyramid, blue sphere, blue Shape Color cube, blue sphere, blue Source: http://www.doksinet Likelihood Weighting +c -c 0.5 0.5 Cloudy +c -c +s +s -s +s -s 0.1 0.9 0.5 0.5 +r -r -s +r -r +c Sprinkler +w -w +w -w +w -w +w -w 0.99 0.01 0.90 0.10 0.90 0.10 0.01 0.99 Rain WetGrass -c +r -r +r -r 0.8 0.2 0.2 0.8 Samples: +c, +s, +r, +w Source: http://www.doksinet Likelihood Weighting ▪ IN: evidence instantiation ▪ w = 1.0 ▪ for i=1, 2, , n ▪ if Xi

is an evidence variable ▪ Xi = observation xi for Xi ▪ Set w = w * P(xi | Parents(Xi)) ▪ else ▪ Sample xi from P(Xi | Parents(Xi)) ▪ return (x1, x2, , xn), w Source: http://www.doksinet Likelihood Weighting ▪ Sampling distribution if z sampled and e fixed evidence Cloudy C ▪ Now, samples have weights S R W ▪ Together, weighted sampling distribution is consistent Source: http://www.doksinet Likelihood Weighting ▪ Likelihood weighting is good ▪ We have taken evidence into account as we generate the sample ▪ E.g here, W’s value will get picked based on the evidence values of S, R ▪ More of our samples will reflect the state of the world suggested by the evidence ▪ Likelihood weighting doesn’t solve all our problems ▪ Evidence influences the choice of downstream variables, but not upstream ones (C isn’t more likely to get a value matching the evidence) ▪ We would like to consider evidence when we sample every variable Gibbs sampling

Source: http://www.doksinet Gibbs Sampling Source: http://www.doksinet Gibbs Sampling ▪ Procedure: keep track of a full instantiation x1, x2, , xn. Start with an arbitrary instantiation consistent with the evidence. Sample one variable at a time, conditioned on all the rest, but keep evidence fixed. Keep repeating this for a long time. ▪ Property: in the limit of repeating this infinitely many times the resulting sample is coming from the correct distribution ▪ Rationale: both upstream and downstream variables condition on evidence. ▪ In contrast: likelihood weighting only conditions on upstream evidence, and hence weights obtained in likelihood weighting can sometimes be very small. Sum of weights over all samples is indicative of how many “effective” samples were obtained, so want high weight. Source: http://www.doksinet Gibbs Sampling Example: P( S | +r) ▪ Step 1: Fix evidence ▪ Step 2: Initialize other variables C C ▪ Randomly ▪ R = +r S +r S +r

W W ▪ Steps 3: Repeat ▪ Choose a non-evidence variable X ▪ Resample X from P( X | all other variables) C S C +r W S C +r W S C +r W S C +r W S C +r W S +r W Source: http://www.doksinet Efficient Resampling of One Variable ▪ Sample from P(S | +c, +r, -w) C S +r W ▪ Many things cancel out – only CPTs with S remain! ▪ More generally: only CPTs that have resampled variable need to be considered, and joined together Source: http://www.doksinet Bayes’ Net Sampling Summary ▪ Prior Sampling P ▪ Rejection Sampling P( Q | e ) ▪ Likelihood Weighting P( Q | e) ▪ Gibbs Sampling P( Q | e ) Source: http://www.doksinet Further Reading on Gibbs Sampling* ▪ Gibbs sampling produces sample from the query distribution P( Q | e ) in limit of re-sampling infinitely often ▪ Gibbs sampling is a special case of more general methods called Markov chain Monte Carlo (MCMC) methods ▪ Metropolis-Hastings is one of the more famous MCMC methods (in fact,

Gibbs sampling is a special case of Metropolis-Hastings) ▪ You may read about Monte Carlo methods – they’re just sampling Source: http://www.doksinet How About Particle Filtering? X1 X2 = likelihood weighting E2 Elapse Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Weight Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Resample Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 (New) Particles: (3,2) (2,2) (3,2) (2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2) Source: http://www.doksinet Particle Filtering ▪ Particle filtering operates on ensemble of samples ▪ Performs likelihood weighting for each individual sample to elapse time and incorporate evidence ▪ Resamples from the weighted ensemble of samples to focus computation for the next time step where most of the probability mass is estimated to be Source: http://www.doksinet CS 188: Artificial

Intelligence Decision Networks and Value of Perfect Information Instructors: Dan Klein and Pieter Abbeel University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Decision Networks Source: http://www.doksinet Decision Networks Umbrella U Weather Forecast Source: http://www.doksinet Decision Networks ▪ MEU: choose the action which maximizes the expected utility given the evidence ▪ Can directly operationalize this with decision networks ▪ Bayes nets with nodes for utility and actions Umbrella ▪ Lets us calculate the expected utility for each action ▪ New node types: U Weather ▪ Chance nodes (just like BNs) ▪ Actions (rectangles, cannot have parents, act as observed evidence) ▪ Utility node (diamond, depends on action and chance nodes) Forecast Source: http://www.doksinet Decision Networks ▪

Action selection ▪ Instantiate all evidence Umbrella ▪ Set action node(s) each possible way ▪ Calculate posterior for all parents of utility node, given the evidence U Weather ▪ Calculate expected utility for each action ▪ Choose maximizing action Forecast Source: http://www.doksinet Decision Networks Umbrella = leave Umbrella U Umbrella = take Weather Optimal decision = leave A W U(A,W) W P(W) leave sun 100 sun 0.7 leave rain 0 rain 0.3 take sun 20 take rain 70 Source: http://www.doksinet Decisions as Outcome Trees {} Umbrella Weather | {} U Weather | {} Weather U(t,s) U(t,r) U(l,s) ▪ Almost exactly like expectimax / MDPs ▪ What’s changed? U(l,r) Source: http://www.doksinet Example: Decision Networks Umbrella = leave Umbrella U Umbrella = take Optimal decision = take Weather Forecast =bad W P(W|F=bad) sun 0.34 rain 0.66 A W U(A,W) leave sun 100 leave rain 0 take sun 20 take rain 70 Source:

http://www.doksinet Decisions as Outcome Trees Umbrella {b} U W | {b} W | {b} Weather U(t,s) Forecast =bad U(t,r) U(l,s) U(l,r) Source: http://www.doksinet Ghostbusters Decision Network Demo: Ghostbusters with probability Bust U Ghost Location Sensor (1,1) Sensor (1,2) Sensor (1,3) Sensor (1,n) Sensor (2,1) Sensor (m,1) Sensor (m,n) Source: http://www.doksinet Video of Demo Ghostbusters with Probability Source: http://www.doksinet Value of Information Source: http://www.doksinet Value of Information ▪ Idea: compute value of acquiring evidence ▪ Can be done directly from decision network ▪ Example: buying oil drilling rights ▪ ▪ ▪ ▪ Two blocks A and B, exactly one has oil, worth k You can drill in one location Prior probabilities 0.5 each, & mutually exclusive Drilling in either A or B has EU = k/2, MEU = k/2 ▪ Question: what’s the value of information of O? ▪ ▪ ▪ ▪ ▪ ▪ ▪ Value of knowing which of A or B has

oil Value is expected gain in MEU from new info Survey may say “oil in a” or “oil in b,” prob 0.5 each If we know OilLoc, MEU is k (either way) Gain in MEU from knowing OilLoc? VPI(OilLoc) = k/2 Fair price of information: k/2 DrillLoc U O P a 1/2 b 1/2 OilLoc D O U a a k a b 0 b a 0 b b k Source: http://www.doksinet VPI Example: Weather MEU with no evidence Umbrella U MEU if forecast is bad Weather MEU if forecast is good Forecast Forecast distribution F P(F) good 0.59 bad 0.41 A W U leave sun 100 leave rain 0 take sun 20 take rain 70 Source: http://www.doksinet Value of Information ▪ ▪ ▪ ▪ Assume we have evidence E=e. Value if we act now: Assume we see that E’ = e’. Value if we act then: BUT E’ is a random variable whose value is unknown, so we don’t know what e’ will be Expected value if E’ is revealed and then we act: a {+e} P(s | +e) U {+e, +e’} a P(s | +e, +e’) U {+e} P(+e’ | +e) ▪

{+e, +e’} Value of information: how much MEU goes up by revealing E’ first then acting, over acting now: a P(-e’ | +e) {+e, -e’} Source: http://www.doksinet VPI Properties ▪ Nonnegative ▪ Nonadditive (think of observing Ej twice) ▪ Order-independent Source: http://www.doksinet Quick VPI Questions ▪ The soup of the day is either clam chowder or split pea, but you wouldn’t order either one. What’s the value of knowing which it is? ▪ There are two kinds of plastic forks at a picnic. One kind is slightly sturdier What’s the value of knowing which? ▪ You’re playing the lottery. The prize will be $0 or $100. You can play any number between 1 and 100 (chance of winning is 1%). What is the value of knowing the winning number? Source: http://www.doksinet Value of Imperfect Information? ▪ No such thing ▪ Information corresponds to the observation of a node in the decision network ▪ If data is “noisy” that just means we don’t observe the

original variable, but another variable which is a noisy version of the original one Source: http://www.doksinet VPI Question ▪ VPI(OilLoc) ? DrillLoc U ▪ VPI(ScoutingReport) ? ▪ VPI(Scout) ? ▪ VPI(Scout | ScoutingReport) ? ▪ Generally: If Parents(U) Z | CurrentEvidence Then VPI( Z | CurrentEvidence) = 0 Scout OilLoc Scouting Report Source: http://www.doksinet POMDPs Source: http://www.doksinet POMDPs ▪ MDPs have: ▪ ▪ ▪ ▪ States S Actions A Transition function P(s’|s,a) (or T(s,a,s’)) Rewards R(s,a,s’) s a s, a s,a,s’ s’ ▪ POMDPs add: b ▪ Observations O ▪ Observation function P(o|s) (or O(s,o)) ▪ POMDPs are MDPs over belief states b (distributions over S) ▪ We’ll be able to say more in a few lectures a b, a o b’ Demo: Ghostbusters with VPI Source: http://www.doksinet Example: Ghostbusters ▪ In (static) Ghostbusters: ▪ Belief state determined by evidence to date {e} ▪ Tree really over evidence sets ▪

Probabilistic reasoning needed to predict new evidence given past evidence b a a b, a e, a e’ e’ b’ ▪ Solving POMDPs ▪ One way: use truncated expectimax to compute approximate value of actions ▪ What if you only considered busting or one sense followed by a bust? ▪ You get a VPI-based agent! {e} {e, e’} {e} abust asense {e}, asense U(abust, {e}) e’ {e, e’} abust U(abust, {e, e’}) Source: http://www.doksinet Video of Demo Ghostbusters with VPI Source: http://www.doksinet More Generally* ▪ General solutions map belief functions to actions ▪ Can divide regions of belief space (set of belief functions) into policy regions (gets complex quickly) ▪ Can build approximate policies using discretization methods ▪ Can factor belief functions in various ways ▪ Overall, POMDPs are very (actually PSACE-) hard ▪ Most real problems are POMDPs, but we can rarely solve then in general! Source: http://www.doksinet Next Time: Machine Learning

Source: http://www.doksinet CS 188: Artificial Intelligence Naïve Bayes Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Machine Learning ▪ Up until now: how use a model to make optimal decisions ▪ Machine learning: how to acquire a model from data / experience ▪ Learning parameters (e.g probabilities) ▪ Learning structure (e.g BN graphs) ▪ Learning hidden concepts (e.g clustering) ▪ Today: model-based classification with Naive Bayes Source: http://www.doksinet Classification Source: http://www.doksinet Example: Spam Filter ▪ Input: an email ▪ Output: spam/ham ▪ Setup: ▪ Get a large collection of example emails, each labeled “spam” or “ham” ▪ Note: someone has to hand label all this data! ▪ Want to learn to predict labels of new,

future emails ▪ Features: The attributes used to make the ham / spam decision ▪ ▪ ▪ ▪ Words: FREE! Text Patterns: $dd, CAPS Non-text: SenderInContacts Dear Sir. First, I must solicit your confidence in this transaction, this is by virture of its nature as being utterly confidencial and top secret. TO BE REMOVED FROM FUTURE MAILINGS, SIMPLY REPLY TO THIS MESSAGE AND PUT "REMOVE" IN THE SUBJECT. 99 MILLION EMAIL ADDRESSES FOR ONLY $99 Ok, Iknow this is blatantly OT but Im beginning to go insane. Had an old Dell Dimension XPS sitting in the corner and decided to put it to use, I know it was working pre being stuck in the corner, but when I plugged it in, hit the power nothing happened. Source: http://www.doksinet Example: Digit Recognition ▪ Input: images / pixel grids ▪ Output: a digit 0-9 ▪ Setup: ▪ Get a large collection of example images, each labeled with a digit ▪ Note: someone has to hand label all this data! ▪ Want to learn to predict

labels of new, future digit images 0 1 2 1 ▪ Features: The attributes used to make the digit decision ▪ Pixels: (6,8)=ON ▪ Shape Patterns: NumComponents, AspectRatio, NumLoops ▪ ?? Source: http://www.doksinet Other Classification Tasks ▪ Classification: given inputs x, predict labels (classes) y ▪ Examples: ▪ Spam detection (input: document, classes: spam / ham) ▪ OCR (input: images, classes: characters) ▪ Medical diagnosis (input: symptoms, classes: diseases) ▪ Automatic essay grading (input: document, classes: grades) ▪ Fraud detection (input: account activity, classes: fraud / no fraud) ▪ Customer service email routing ▪ many more ▪ Classification is an important commercial technology! Source: http://www.doksinet Model-Based Classification Source: http://www.doksinet Model-Based Classification ▪ Model-based approach ▪ Build a model (e.g Bayes’ net) where both the label and features are random variables ▪ Instantiate any observed

features ▪ Query for the distribution of the label conditioned on the features ▪ Challenges ▪ What structure should the BN have? ▪ How should we learn its parameters? Source: http://www.doksinet Naïve Bayes for Digits ▪ Naïve Bayes: Assume all features are independent effects of the label Y ▪ Simple digit recognition version: ▪ One feature (variable) Fij for each grid position <i,j> ▪ Feature values are on / off, based on whether intensity is more or less than 0.5 in underlying image ▪ Each input maps to a feature vector, e.g ▪ Here: lots of features, each is binary valued ▪ Naïve Bayes model: ▪ What do we need to learn? F1 F2 Fn Source: http://www.doksinet General Naïve Bayes ▪ A general Naive Bayes model: Y |Y| parameters F1 |Y| x |F|n values n x |F| x |Y| parameters ▪ We only have to specify how each feature depends on the class ▪ Total number of parameters is linear in n ▪ Model is very simplistic, but often works anyway

F2 Fn Source: http://www.doksinet Inference for Naïve Bayes ▪ Goal: compute posterior distribution over label variable Y ▪ Step 1: get joint probability of label and evidence for each label + ▪ Step 2: sum to get probability of evidence ▪ Step 3: normalize by dividing Step 1 by Step 2 Source: http://www.doksinet General Naïve Bayes ▪ What do we need in order to use Naïve Bayes? ▪ Inference method (we just saw this part) ▪ Start with a bunch of probabilities: P(Y) and the P(Fi|Y) tables ▪ Use standard inference to compute P(Y|F1Fn) ▪ Nothing new here ▪ Estimates of local conditional probability tables ▪ P(Y), the prior over labels ▪ P(Fi|Y) for each feature (evidence variable) ▪ These probabilities are collectively called the parameters of the model and denoted by  ▪ Up until now, we assumed these appeared by magic, but ▪ they typically come from training data counts: we’ll look at this soon Source: http://www.doksinet Example:

Conditional Probabilities 1 0.1 1 0.01 1 0.05 2 0.1 2 0.05 2 0.01 3 0.1 3 0.05 3 0.90 4 0.1 4 0.30 4 0.80 5 0.1 5 0.80 5 0.90 6 0.1 6 0.90 6 0.90 7 0.1 7 0.05 7 0.25 8 0.1 8 0.60 8 0.85 9 0.1 9 0.50 9 0.60 0 0.1 0 0.80 0 0.80 Source: http://www.doksinet Naïve Bayes for Text ▪ Bag-of-words Naïve Bayes: ▪ ▪ ▪ ▪ Features: Wi is the word at positon i As before: predict label conditioned on feature variables (spam vs. ham) As before: assume features are conditionally independent given label New: each Wi is identically distributed ▪ Generative model: Word at position i, not ith word in the dictionary! ▪ “Tied” distributions and bag-of-words ▪ Usually, each variable gets its own conditional probability distribution P(F|Y) ▪ In a bag-of-words model ▪ Each position is identically distributed ▪ All positions share the same conditional probs P(W|Y) ▪ Why make this assumption? ▪ Called

“bag-of-words” because model is insensitive to word order or reordering Source: http://www.doksinet Example: Spam Filtering ▪ Model: ▪ What are the parameters? ham : 0.66 spam: 0.33 the : to : and : of : you : a : with: from: . ▪ Where do these tables come from? 0.0156 0.0153 0.0115 0.0095 0.0093 0.0086 0.0080 0.0075 the : to : of : 2002: with: from: and : a : . 0.0210 0.0133 0.0119 0.0110 0.0108 0.0107 0.0105 0.0100 Source: http://www.doksinet Spam Example Word P(w|spam) P(w|ham) Tot Spam Tot Ham (prior) 0.33333 0.66666 -1.1 -0.4 Gary 0.00002 0.00021 -11.8 -8.9 would 0.00069 0.00084 -19.1 -16.0 you 0.00881 0.00304 -23.8 -21.8 like 0.00086 0.00083 -30.9 -28.9 to 0.01517 0.01339 -35.1 -33.2 lose 0.00008 0.00002 -44.5 -44.0 weight 0.00016 0.00002 -53.3 -55.0 while 0.00027 0.00027 -61.5 -63.2 you 0.00881 0.00304 -66.2 -69.0 sleep 0.00006 0.00001 -76.0 -80.5 P(spam | w) = 98.9 Source: http://www.doksinet

Training and Testing Source: http://www.doksinet Important Concepts ▪ Data: labeled instances, e.g emails marked spam/ham ▪ Training set ▪ Held out set ▪ Test set ▪ Features: attribute-value pairs which characterize each x ▪ Experimentation cycle ▪ ▪ ▪ ▪ ▪ Learn parameters (e.g model probabilities) on training set (Tune hyperparameters on held-out set) Compute accuracy of test set Very important: never “peek” at the test set! Evaluation ▪ Accuracy: fraction of instances predicted correctly ▪ Training Data Held-Out Data Overfitting and generalization ▪ Want a classifier which does well on test data ▪ Overfitting: fitting the training data very closely, but not generalizing well ▪ We’ll investigate overfitting and generalization formally in a few lectures Test Data Source: http://www.doksinet Generalization and Overfitting Source: http://www.doksinet Overfitting 30 25 20 Degree 15 polynomial 15 10 5 0 -5 -10 -15 0 2

4 6 8 10 12 14 16 18 20 Source: http://www.doksinet Example: Overfitting 2 wins!! Source: http://www.doksinet Example: Overfitting ▪ Posteriors determined by relative probabilities (odds ratios): south-west nation morally nicely extent seriously . : : : : : : inf inf inf inf inf inf screens minute guaranteed $205.00 delivery signature . What went wrong here? : : : : : : inf inf inf inf inf inf Source: http://www.doksinet Generalization and Overfitting ▪ Relative frequency parameters will overfit the training data! ▪ ▪ ▪ ▪ ▪ Just because we never saw a 3 with pixel (15,15) on during training doesn’t mean we won’t see it at test time Unlikely that every occurrence of “minute” is 100% spam Unlikely that every occurrence of “seriously” is 100% ham What about all the words that don’t occur in the training set at all? In general, we can’t go around giving unseen events zero probability ▪ As an extreme case, imagine using the entire

email as the only feature ▪ Would get the training data perfect (if deterministic labeling) ▪ Wouldn’t generalize at all ▪ Just making the bag-of-words assumption gives us some generalization, but isn’t enough ▪ To generalize better: we need to smooth or regularize the estimates Source: http://www.doksinet Parameter Estimation Source: http://www.doksinet Parameter Estimation ▪ Estimating the distribution of a random variable r b ▪ Elicitation: ask a human (why is this hard?) b ▪ Empirically: use training data (learning!) ▪ E.g: for each outcome x, look at the empirical rate of that value: r r ▪ This is the estimate that maximizes the likelihood of the data b br b b b br r b r b b Source: http://www.doksinet Smoothing Source: http://www.doksinet Maximum Likelihood? ▪ Relative frequencies are the maximum likelihood estimates ▪ Another option is to consider the most likely parameter value given the data ???? Source:

http://www.doksinet Unseen Events Source: http://www.doksinet Laplace Smoothing ▪ Laplace’s estimate: ▪ Pretend you saw every outcome once more than you actually did ▪ Can derive this estimate with Dirichlet priors (see cs281a) r r b Source: http://www.doksinet Laplace Smoothing ▪ Laplace’s estimate (extended): ▪ Pretend you saw every outcome k extra times ▪ What’s Laplace with k = 0? ▪ k is the strength of the prior ▪ Laplace for conditionals: ▪ Smooth each condition independently: r r b Source: http://www.doksinet Estimation: Linear Interpolation* ▪ In practice, Laplace often performs poorly for P(X|Y): ▪ When |X| is very large ▪ When |Y| is very large ▪ Another option: linear interpolation ▪ Also get the empirical P(X) from the data ▪ Make sure the estimate of P(X|Y) isn’t too different from the empirical P(X) ▪ What if  is 0? 1? ▪ For even better ways to estimate parameters, as well as details of the math, see

cs281a, cs288 Source: http://www.doksinet Real NB: Smoothing ▪ For real classification problems, smoothing is critical ▪ New odds ratios: helvetica seems group ago areas . : 11.4 : 10.8 : 10.2 : 8.4 : 8.3 verdana Credit ORDER <FONT> money . Do these make more sense? : : : : : 28.8 28.4 27.2 26.9 26.5 Source: http://www.doksinet Tuning Source: http://www.doksinet Tuning on Held-Out Data ▪ Now we’ve got two kinds of unknowns ▪ Parameters: the probabilities P(X|Y), P(Y) ▪ Hyperparameters: e.g the amount / type of smoothing to do, k,  ▪ What should we learn where? ▪ Learn parameters from training data ▪ Tune hyperparameters on different data ▪ Why? ▪ For each value of the hyperparameters, train and test on the held-out data ▪ Choose the best value and do a final test on the test data Source: http://www.doksinet Features Source: http://www.doksinet Errors, and What to Do ▪ Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has

partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question weve received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . To receive your $30 Amazoncom promotional certificate, click through to http://www.amazoncom/apparel and see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message However, if youd rather not receive future e-mails announcing new store launches, please click . Source: http://www.doksinet What to Do About Errors? ▪ Need more features– words aren’t enough! ▪ ▪ ▪ ▪ ▪ ▪ Have you emailed the sender before? Have 1K other people just gotten the same email? Is the sending information consistent? Is the email in ALL CAPS? Do inline URLs point where they say they point? Does the email address you by (your)

name? ▪ Can add these information sources as new variables in the NB model ▪ Next class we’ll talk about classifiers which let you easily add arbitrary features more easily Source: http://www.doksinet Baselines ▪ First step: get a baseline ▪ Baselines are very simple “straw man” procedures ▪ Help determine how hard the task is ▪ Help know what a “good” accuracy is ▪ Weak baseline: most frequent label classifier ▪ ▪ ▪ ▪ Gives all test instances whatever label was most common in the training set E.g for spam filtering, might label everything as ham Accuracy might be very high if the problem is skewed E.g calling everything “ham” gets 66%, so a classifier that gets 70% isn’t very good ▪ For real research, usually use previous work as a (strong) baseline Source: http://www.doksinet Confidences from a Classifier ▪ The confidence of a probabilistic classifier: ▪ Posterior over the top label ▪ Represents how sure the classifier is of

the classification ▪ Any probabilistic model will have confidences ▪ No guarantee confidence is correct ▪ Calibration ▪ Weak calibration: higher confidences mean higher accuracy ▪ Strong calibration: confidence predicts accuracy rate ▪ What’s the value of calibration? Source: http://www.doksinet Summary ▪ Bayes rule lets us do diagnostic queries with causal probabilities ▪ The naïve Bayes assumption takes all features to be independent given the class label ▪ We can build classifiers out of a naïve Bayes model using training data ▪ Smoothing estimates is important in real systems ▪ Classifier confidences are useful, when you can get them Source: http://www.doksinet Next Time: Perceptron! Source: http://www.doksinet CS 188: Artificial Intelligence Perceptrons Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

materials are available at http://aiberkeleyedu] Source: http://www.doksinet Error-Driven Classification Source: http://www.doksinet Errors, and What to Do ▪ Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered with ScanSoft to offer you the latest version of OmniPage Pro, for just $99.99* - the regular list price is $499! The most common question weve received about this offer is - Is this genuine? We would like to assure you that this offer is authorized by ScanSoft, is genuine and valid. You can get the . To receive your $30 Amazoncom promotional certificate, click through to http://www.amazoncom/apparel and see the prominent link for the $30 offer. All details are there. We hope you enjoyed receiving this message However, if youd rather not receive future e-mails announcing new store launches, please click . Source: http://www.doksinet What to Do About Errors ▪ Problem: there’s still spam in your inbox ▪ Need more features – words

aren’t enough! ▪ ▪ ▪ ▪ ▪ ▪ Have you emailed the sender before? Have 1M other people just gotten the same email? Is the sending information consistent? Is the email in ALL CAPS? Do inline URLs point where they say they point? Does the email address you by (your) name? ▪ Naïve Bayes models can incorporate a variety of features, but tend to do best in homogeneous cases (e.g all features are word occurrences) Source: http://www.doksinet Later On Web Search Decision Problems Source: http://www.doksinet Linear Classifiers Source: http://www.doksinet Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR NAME MISSPELLED FROM FRIEND . : : : : 2 0 2 0 PIXEL-7,12 PIXEL-7,13 . NUM LOOPS . : 1 : 0 : 1 SPAM or + “2” Source: http://www.doksinet Some (Simplified) Biology ▪ Very loose inspiration: human neurons Source: http://www.doksinet Linear Classifiers ▪ Inputs are feature

values ▪ Each feature has a weight ▪ Sum is the activation ▪ If the activation is: ▪ Positive, output +1 ▪ Negative, output -1 f1 f2 f3 w1 w2 w3  >0? Source: http://www.doksinet Weights ▪ Binary case: compare features to a weight vector ▪ Learning: figure out the weight vector from examples # free YOUR NAME MISSPELLED FROM FRIEND . : 4 :-1 : 1 :-3 Dot product positive means the positive class # free YOUR NAME MISSPELLED FROM FRIEND . : : : : 2 0 2 0 # free YOUR NAME MISSPELLED FROM FRIEND . : : : : 0 1 1 1 Source: http://www.doksinet Decision Rules Source: http://www.doksinet Binary Decision Rule ▪ ▪ ▪ ▪ Examples are points Any weight vector is a hyperplane One side corresponds to Y=+1 Other corresponds to Y=-1 BIAS : -3 free : 4 money : 2 . money ▪ In the space of feature vectors 2 +1 = SPAM 1 -1 = HAM 0 0 1 free Source: http://www.doksinet Weight Updates Source: http://www.doksinet Learning: Binary Perceptron ▪

Start with weights = 0 ▪ For each training instance: ▪ Classify with current weights ▪ If correct (i.e, y=y*), no change! ▪ If wrong: adjust the weight vector Source: http://www.doksinet Learning: Binary Perceptron ▪ Start with weights = 0 ▪ For each training instance: ▪ Classify with current weights ▪ If correct (i.e, y=y*), no change! ▪ If wrong: adjust the weight vector by adding or subtracting the feature vector. Subtract if y* is -1. Source: http://www.doksinet Examples: Perceptron ▪ Separable Case Source: http://www.doksinet Multiclass Decision Rule ▪ If we have multiple classes: ▪ A weight vector for each class: ▪ Score (activation) of a class y: ▪ Prediction highest score wins Binary = multiclass where the negative class has weight zero Source: http://www.doksinet Learning: Multiclass Perceptron ▪ Start with all weights = 0 ▪ Pick up training examples one by one ▪ Predict with current weights ▪ If correct, no change! ▪ If

wrong: lower score of wrong answer, raise score of right answer Source: http://www.doksinet Example: Multiclass Perceptron “win the vote” “win the election” “win the game” BIAS win game vote the . : : : : : 1 0 0 0 0 BIAS win game vote the . : : : : : 0 0 0 0 0 BIAS win game vote the . : : : : : 0 0 0 0 0 Source: http://www.doksinet Properties of Perceptrons ▪ Separability: true if some parameters get the training set perfectly correct Separable ▪ Convergence: if the training is separable, perceptron will eventually converge (binary case) ▪ Mistake Bound: the maximum number of mistakes (binary case) related to the margin or degree of separability Non-Separable Source: http://www.doksinet Examples: Perceptron ▪ Non-Separable Case Source: http://www.doksinet Improving the Perceptron Source: http://www.doksinet Problems with the Perceptron ▪ Noise: if the data isn’t separable, weights might thrash ▪ Averaging weight vectors over time

can help (averaged perceptron) ▪ Mediocre generalization: finds a “barely” separating solution ▪ Overtraining: test / held-out accuracy usually rises, then falls ▪ Overtraining is a kind of overfitting Source: http://www.doksinet Fixing the Perceptron ▪ Idea: adjust the weight update to mitigate these effects ▪ MIRA*: choose an update size that fixes the current mistake ▪ but, minimizes the change to w ▪ The +1 helps to generalize * Margin Infused Relaxed Algorithm Source: http://www.doksinet Minimum Correcting Update min not =0, or would not have made an error, so min will be where equality holds Source: http://www.doksinet Maximum Step Size ▪ In practice, it’s also bad to make updates that are too large ▪ Example may be labeled incorrectly ▪ You may not have enough features ▪ Solution: cap the maximum possible value of  with some constant C ▪ Corresponds to an optimization that assumes non-separable data ▪ Usually converges faster

than perceptron ▪ Usually better, especially on noisy data Source: http://www.doksinet Linear Separators ▪ Which of these linear separators is optimal? Source: http://www.doksinet Support Vector Machines ▪ ▪ ▪ ▪ Maximizing the margin: good according to intuition, theory, practice Only support vectors matter; other training examples are ignorable Support vector machines (SVMs) find the separator with max margin Basically, SVMs are MIRA where you optimize over all examples at once MIRA SVM Source: http://www.doksinet Classification: Comparison ▪ Naïve Bayes ▪ ▪ ▪ ▪ Builds a model training data Gives prediction probabilities Strong assumptions about feature independence One pass through data (counting) ▪ Perceptrons / MIRA: ▪ ▪ ▪ ▪ Makes less assumptions about data Mistake-driven learning Multiple passes through data (prediction) Often more accurate Source: http://www.doksinet Web Search Source: http://www.doksinet Extension: Web Search

▪ Information retrieval: ▪ Given information needs, produce information ▪ Includes, e.g web search, question answering, and classic IR ▪ Web search: not exactly classification, but rather ranking x = “Apple Computers” Source: http://www.doksinet Feature-Based Ranking x = “Apple Computer” x, x, Source: http://www.doksinet Perceptron for Ranking ▪ ▪ ▪ ▪ Inputs Candidates Many feature vectors: One weight vector: ▪ Prediction: ▪ Update (if wrong): Source: http://www.doksinet Apprenticeship Source: http://www.doksinet Pacman Apprenticeship! ▪ Examples are states s ▪ ▪ ▪ ▪ Candidates are pairs (s,a) “Correct” actions: those taken by expert Features defined over (s,a) pairs: f(s,a) Score of a q-state (s,a) given by: “correct” action a* ▪ How is this VERY different from reinforcement learning? [Demo: Pacman Apprentice (L22D1,2,3)] Source: http://www.doksinet Video of Demo Pacman Apprentice Source: http://www.doksinet

Next: Kernels and Clustering Source: http://www.doksinet CS 188: Artificial Intelligence Kernels and Clustering Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet Case-Based Learning Source: http://www.doksinet Non-Separable Data Source: http://www.doksinet Case-Based Reasoning ▪ Classification from similarity ▪ Case-based reasoning ▪ Predict an instance’s label using similar instances ▪ Nearest-neighbor classification ▪ 1-NN: copy the label of the most similar data point ▪ K-NN: vote the k nearest neighbors (need a weighting scheme) ▪ Key issue: how to define similarity ▪ Trade-offs: Small k gives relevant neighbors, Large k gives smoother functions http://www.cscmuedu/~zhuxj/courseproject/knndemo/KNNhtml Source: http://www.doksinet

Parametric / Non-Parametric ▪ Parametric models: ▪ Fixed set of parameters ▪ More data means better settings ▪ Non-parametric models: ▪ Complexity of the classifier increases with data ▪ Better in the limit, often worse in the non-limit Truth ▪ (K)NN is non-parametric 2 Examples 10 Examples 100 Examples 10000 Examples Source: http://www.doksinet Nearest-Neighbor Classification ▪ Nearest neighbor for digits: ▪ Take new image ▪ Compare to all training images ▪ Assign based on closest example 0 1 ▪ Encoding: image is vector of intensities: 2 ▪ What’s the similarity function? 0 ▪ Dot product of two images vectors? 1 ▪ Usually normalize vectors so ||x|| = 1 ▪ min = 0 (when?), max = 1 (when?) 2 Source: http://www.doksinet Similarity Functions Source: http://www.doksinet Basic Similarity ▪ Many similarities based on feature dot products: ▪ If features are just the pixels: ▪ Note: not all similarities are of this form Source:

http://www.doksinet Invariant Metrics ▪ Better similarity functions use knowledge about vision ▪ Example: invariant metrics: ▪ Similarities are invariant under certain transformations ▪ Rotation, scaling, translation, stroke-thickness ▪ E.g: ▪ 16 x 16 = 256 pixels; a point in 256-dim space ▪ These points have small similarity in R256 (why?) ▪ How can we incorporate such invariances into our similarities? This and next few slides adapted from Xiao Hu, UIUC Source: http://www.doksinet Rotation Invariant Metrics ▪ Each example is now a curve in R256 ▪ Rotation invariant similarity: s’=max s( r( ), r( )) ▪ E.g highest similarity between images’ rotation lines Source: http://www.doksinet Template Deformation ▪ Deformable templates: ▪ ▪ ▪ ▪ An “ideal” version of each category Best-fit to image using min variance Cost for high distortion of template Cost for image points being far from distorted template ▪ Used in many commercial digit

recognizers Examples from [Hastie 94] Source: http://www.doksinet A Tale of Two Approaches ▪ Nearest neighbor-like approaches ▪ Can use fancy similarity functions ▪ Don’t actually get to do explicit learning ▪ Perceptron-like approaches ▪ Explicit training to reduce empirical error ▪ Can’t use fancy similarity, only linear ▪ Or can they? Let’s find out! Source: http://www.doksinet Kernelization Source: http://www.doksinet Perceptron Weights ▪ What is the final value of a weight wy of a perceptron? ▪ Can it be any real vector? ▪ No! It’s built by adding up inputs. ▪ Can reconstruct weight vectors (the primal representation) from update counts (the dual representation) Source: http://www.doksinet Dual Perceptron ▪ How to classify a new example x? ▪ If someone tells us the value of K for each pair of examples, never need to build the weight vectors (or the feature vectors)! Source: http://www.doksinet Dual Perceptron ▪ Start with zero

counts (alpha) ▪ Pick up training instances one by one ▪ Try to classify xn, ▪ If correct, no change! ▪ If wrong: lower count of wrong class (for this instance), raise count of right class (for this instance) Source: http://www.doksinet Kernelized Perceptron ▪ If we had a black box (kernel) K that told us the dot product of two examples x and x’: ▪ Could work entirely with the dual representation ▪ No need to ever take dot products (“kernel trick”) ▪ Like nearest neighbor – work with black-box similarities ▪ Downside: slow if many examples get nonzero alpha Source: http://www.doksinet Kernels: Who Cares? ▪ So far: a very strange way of doing a very simple calculation ▪ “Kernel trick”: we can substitute any* similarity function in place of the dot product ▪ Lets us learn new kinds of hypotheses * Fine print: if your kernel doesn’t satisfy certain technical requirements, lots of proofs break. Eg convergence, mistake bounds. In practice,

illegal kernels sometimes work (but not always). Source: http://www.doksinet Non-Linearity Source: http://www.doksinet Non-Linear Separators ▪ Data that is linearly separable works out great for linear decision rules: x 0 ▪ But what are we going to do if the dataset is just too hard? x 0 ▪ How about mapping data to a higher-dimensional space: x2 0 x This and next few slides adapted from Ray Mooney, UT Source: http://www.doksinet Non-Linear Separators ▪ General idea: the original feature space can always be mapped to some higherdimensional feature space where the training set is separable: Φ: x φ(x) Source: http://www.doksinet Some Kernels ▪ Kernels implicitly map original vectors to higher dimensional spaces, take the dot product there, and hand the result back ▪ Linear kernel: ▪ Quadratic kernel: ▪ RBF: infinite dimensional representation ▪ Discrete kernels: e.g string kernels Source: http://www.doksinet Why Kernels? ▪ Can’t you just

add these features on your own (e.g add all pairs of features instead of using the quadratic kernel)? ▪ ▪ ▪ ▪ Yes, in principle, just compute them No need to modify any algorithms But, number of features can get large (or infinite) Some kernels not as usefully thought of in their expanded representation, e.g RBF kernels ▪ Kernels let us compute with these features implicitly ▪ Example: implicit dot product in quadratic kernel takes much less space and time per dot product ▪ Of course, there’s the cost for using the pure dual algorithms: you need to compute the similarity to every training datum Source: http://www.doksinet Recap: Classification ▪ Classification systems: ▪ ▪ ▪ ▪ Supervised learning Make a prediction given evidence We’ve seen several methods for this Useful when you have labeled data Source: http://www.doksinet Clustering ▪ Clustering systems: ▪ Unsupervised learning ▪ Detect patterns in unlabeled data ▪ E.g group emails or

search results ▪ E.g find categories of customers ▪ E.g detect anomalous program executions ▪ Useful when don’t know what you’re looking for ▪ Requires data, but no labels ▪ Often get gibberish Source: http://www.doksinet Clustering Source: http://www.doksinet Clustering ▪ Basic idea: group together similar instances ▪ Example: 2D point patterns ▪ What could “similar” mean? ▪ One option: small (squared) Euclidean distance Source: http://www.doksinet K-Means Source: http://www.doksinet K-Means ▪ An iterative clustering algorithm ▪ Pick K random points as cluster centers (means) ▪ Alternate: ▪ Assign data instances to closest mean ▪ Assign each mean to the average of its assigned points ▪ Stop when no points’ assignments change Source: http://www.doksinet K-Means Example Source: http://www.doksinet K-Means as Optimization ▪ Consider the total distance to the means: means points assignments ▪ Each iteration reduces phi

▪ Two stages each iteration: ▪ Update assignments: fix means c, change assignments a ▪ Update means: fix assignments a, change means c Source: http://www.doksinet Phase I: Update Assignments ▪ For each point, re-assign to closest mean: ▪ Can only decrease total distance phi! Source: http://www.doksinet Phase II: Update Means ▪ Move each mean to the average of its assigned points: ▪ Also can only decrease total distance (Why?) ▪ Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean Source: http://www.doksinet Initialization ▪ K-means is non-deterministic ▪ Requires initial means ▪ It does matter what you pick! ▪ What can go wrong? ▪ Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics Source: http://www.doksinet K-Means Getting Stuck ▪ A local optimum: Why doesn’t this work out like the earlier example, with the purple taking over half the

blue? Source: http://www.doksinet K-Means Questions ▪ Will K-means converge? ▪ To a global optimum? ▪ Will it always find the true patterns in the data? ▪ If the patterns are very very clear? ▪ Will it find something interesting? ▪ Do people ever use it? ▪ How many clusters to pick? Source: http://www.doksinet Agglomerative Clustering Source: http://www.doksinet Agglomerative Clustering ▪ Agglomerative clustering: ▪ First merge very similar instances ▪ Incrementally build larger clusters out of smaller clusters ▪ Algorithm: ▪ Maintain a set of clusters ▪ Initially, each instance in its own cluster ▪ Repeat: ▪ Pick the two closest clusters ▪ Merge them into a new cluster ▪ Stop when there’s only one cluster left ▪ Produces not one clustering, but a family of clusterings represented by a dendrogram Source: http://www.doksinet Agglomerative Clustering ▪ How should we define “closest” for clusters with multiple elements? ▪ Many

options ▪ ▪ ▪ ▪ Closest pair (single-link clustering) Farthest pair (complete-link clustering) Average of all pairs Ward’s method (min variance, like k-means) ▪ Different choices create different clustering behaviors Source: http://www.doksinet Example: Google News Top-level categories: supervised classification Story groupings: unsupervised clustering 43 Source: http://www.doksinet Next Time: Advanced Applications! Source: http://www.doksinet CS 188: Artificial Intelligence NLP, Games, and Robotic Cars * Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source: http://www.doksinet So Far: Foundational Methods Source: http://www.doksinet Now: Advanced Applications Source: http://www.doksinet Natural Language Processing Source: http://www.doksinet What is NLP? ▪

Fundamental goal: analyze and process human language, broadly, robustly, accurately ▪ End systems that we want to build: ▪ Ambitious: speech recognition, machine translation, information extraction, dialog interfaces, question answering ▪ Modest: spelling correction, text categorization Source: http://www.doksinet Problem: Ambiguities ▪ Headlines: ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Enraged Cow Injures Farmer With Ax Hospitals Are Sued by 7 Foot Doctors Ban on Nude Dancing on Governor’s Desk Iraqi Head Seeks Arms Local HS Dropouts Cut in Half Juvenile Court to Try Shooting Defendant Stolen Painting Found by Tree Kids Make Nutritious Snacks ▪ Why are these funny? Source: http://www.doksinet Parsing as Search Source: http://www.doksinet Grammar: PCFGs ▪ Natural language grammars are very ambiguous! ▪ PCFGs are a formal probabilistic model of trees ▪ Each “rule” has a conditional probability (like an HMM) ▪ Tree’s probability is the product of all rules

used ▪ Parsing: Given a sentence, find the best tree – search! ROOT S 375/420 S NP VP . 320/392 NP PRP 127/539 VP VBD ADJP . 32/401 Source: http://www.doksinet Syntactic Analysis Hurricane Emily howled toward Mexico s Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun, where frightened tourists squeezed into musty shelters. [Demo: Berkeley NLP Group Parser http://tomato.banataoberkeleyedu:8080/parser/parserhtml] Source: http://www.doksinet Dialog Systems Source: http://www.doksinet ELIZA ▪ A “psychotherapist” agent (Weizenbaum, ~1964) ▪ Led to a long line of chatterbots ▪ How does it work: ▪ Trivial NLP: string match and substitution ▪ Trivial knowledge: tiny script / response database ▪ Example: matching “I remember ” results in “Do you often think of ”? ▪ Can fool some people some of the time? [Demo: http://nlp-addiction.com/eliza] Source: http://www.doksinet Watson Source:

http://www.doksinet What’s in Watson? ▪ A question-answering system (IBM, 2011) ▪ Designed for the game of Jeopardy ▪ How does it work: ▪ Sophisticated NLP: deep analysis of questions, noisy matching of questions to potential answers ▪ Lots of data: onboard storage contains a huge collection of documents (e.g Wikipedia, etc), exploits redundancy ▪ Lots of computation: 90+ servers ▪ Can beat all of the people all of the time? Source: http://www.doksinet Machine Translation Source: http://www.doksinet Machine Translation ▪ Translate text from one language to another ▪ Recombines fragments of example translations ▪ Challenges: ▪ What fragments? [learning to translate] ▪ How to make efficient? [fast translation search] Source: http://www.doksinet The Problem with Dictionary Lookups 16 Source: http://www.doksinet MT: 60 Years in 60 Seconds Source: http://www.doksinet Data-Driven Machine Translation Source: http://www.doksinet Learning to

Translate Source: http://www.doksinet An HMM Translation Model 20 Source: http://www.doksinet Levels of Transfer Source: http://www.doksinet Example: Syntactic MT Output [ISI MT system output] 24 Source: http://www.doksinet Starcraft Source: http://www.doksinet Starcraft Source: http://www.doksinet What is Starcraft? Image from Ben Weber Source: http://www.doksinet Why is Starcraft Hard? ▪ The game of Starcraft is: ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Adversarial Long Horizon Partially Observable Realtime Huge branching factor Concurrent Resource-rich ▪ No single algorithm (e.g minimax) will solve it off-the-shelf! Source: http://www.doksinet Starcraft AIs: AIIDE 2010 ▪ 28 Teams: international entrants, universities, research labs Source: http://www.doksinet The Berkeley Overmind Search: path planning CSPs: base layout Minimax: targeting Learning: micro control Inference: tracking units Scheduling: resources Hierarchical control

http://overmind.eecsberkeleyedu Source: http://www.doksinet Search for Pathing [Pathing] Source: http://www.doksinet Source: http://www.doksinet Minimax for Targeting [Targeting] Source: http://www.doksinet Source: http://www.doksinet Machine Learning for Micro Control [RL, Potential Fields] Source: http://www.doksinet Source: http://www.doksinet Inference / VPI / Scouting [Scouting] Source: http://www.doksinet Source: http://www.doksinet AIIDE 2010 Competition Source: http://www.doksinet Autonomous Driving Source: http://www.doksinet Grand Challenge 2005: Barstow, CA, to Primm, NV ▪ ▪ ▪ ▪ 150 mile off-road robot race across the Mojave desert Natural and manmade hazards No driver, no remote control No dynamic passing Source: http://www.doksinet Autonomous Vehicles [Video: Race, Short] [VIDEO: GC Bad] Autonomous vehicle slides adapted from Sebastian Thrun Source: http://www.doksinet Grand Challenge 2005 Nova Video [VIDEO:

nova-race-supershort.mp4] Source: http://www.doksinet Grand Challenge 2005 – Bad [VIDEO: grand challenge – bad.wmv] Source: http://www.doksinet An Autonomous Car GPS GPS compass 6 Computers IMU E-stop 5 Lasers Camera Radar Control Screen Steering motor Source: http://www.doksinet Actions: Steering Control Error Steering Angle (with respect to trajectory) Source: http://www.doksinet Laser Readings for Flat / Empty Road 3 2 1 Source: http://www.doksinet Laser Readings for Road with Obstacle DZ Source: http://www.doksinet Obstacle Detection Trigger if |Zi-Zj| > 15cm for nearby zi, zj Raw Measurements: 12.6% false positives Source: http://www.doksinet Probabilistic Error Model GPS IMU GPS IMU GPS IMU xt xt+1 xt+2 zt zt+1 zt+2 Source: http://www.doksinet HMMs for Detection Raw Measurements: 12.6% false positives HMM Inference: 0.02% false positives Source: http://www.doksinet Sensors: Camera Source: http://www.doksinet Vision for a Car

Source: http://www.doksinet Vision for a Car [VIDEO: lidar vision for a car] Source: http://www.doksinet Self-Supervised Vision Source: http://www.doksinet Self-Supervised Vision [VIDEO: self-supervised vision] Source: http://www.doksinet Urban Environments Source: http://www.doksinet Sensors: Laser Readings [VIDEO: Urban Challenge (Lidar)] Source: http://www.doksinet Environmental Tracking [VIDEO: Urban Challenge (People)] Source: http://www.doksinet Google Self-Driving Car [VIDEO: ROBOTICS – gcar.m4v] Source: http://www.doksinet Next Time: Computer Vision, Robotic Helicopters, and Walking Robots! Source: http://www.doksinet CS 188: Artificial Intelligence Advanced Applications: Computer Vision and Robotics * Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://aiberkeleyedu] Source:

http://www.doksinet Computer Vision Source: http://www.doksinet Object Detection Source: http://www.doksinet Object Detection Approach 1: HOG + SVM Source: http://www.doksinet Features and Generalization [Dalal and Triggs, 2005] Source: http://www.doksinet Features and Generalization Image HoG Source: http://www.doksinet Training ▪ Round 1 ▪ Training set = ▪ Positive examples: from labeling ▪ Negative examples: random patches preliminary SVM ▪ Round 2 (“bootstrapping” or “mining hard negatives”) ▪ Training set = ▪ Positive examples: from labeling ▪ Negative examples: patches that have score >= -1 final SVM Source: http://www.doksinet State-of-the-art Results sofa bottle cat [Girschik, Felzenszwalb, McAllester] Source: http://www.doksinet State-of-the-art Results person car horse [Girschik, Felzenszwalb, McAllester] Source: http://www.doksinet Object Detection Approach 2: Deep Learning Source: http://www.doksinet How

Many Computers to Identify a Cat? “Google Brain” [Le, Ng, Dean, et al, 2012] Source: http://www.doksinet Perceptron f1 f2 f3 w1 w2 w3  >0? Source: http://www.doksinet Two-Layer Neural Network w11 w21 w31 f1 f2 f3  >0? w1 w12 w22 w32  >0? w2 w3 w13 w23 w33  >0?  Source: http://www.doksinet N-Layer Neural Network  >0?  >0?  >0?  >0?  >0?  >0?  >0?  >0?  >0? f1 f2 f3  Source: http://www.doksinet Hill Climbing ▪ Simple, general idea: ▪ ▪ ▪ ▪ Start wherever Repeat: move to the best neighboring state If no neighbors better than current, quit Neighbors = small perturbations of w ▪ Property ▪ Many local optima --> How to find a good local optimum? Source: http://www.doksinet Auto-Encoder (Crude Idea Sketch) f1  f3 >0? f1  >0? f2  >0? f3 >0? f2   >0? Source: http://www.doksinet

Training Procedure: Stacked Auto-Encoder ▪ Auto-encoder ▪ Layer 1 = “compressed” version of input layer ▪ Stacked Auto-encoder ▪ For every image, make a compressed image (= layer 1 response to image) ▪ Learn Layer 2 by using compressed images as input, and as output to be predicted ▪ Repeat similarly for Layer 3, 4, etc. ▪ Some details left out ▪ Typically in between layers responses get agglomerated from several neurons (“pooling” / “complex cells”) Source: http://www.doksinet Final Result: Trained Neural Network  >0?  >0?  >0?  >0?  >0?  >0? f1 f2 fN  >0?  >0?  >0? Source: http://www.doksinet Final Result: Trained Neural Network  >0?  >0?  >0?  >0?  >0?  >0? f1 f2 fN  >0?  >0?  >0?  Source: http://www.doksinet Robotics Source: http://www.doksinet Robotic Helicopters

Source: http://www.doksinet Motivating Example ◼ How do we execute a task like this? [VIDEO: tictoc results.wmv] Source: http://www.doksinet Autonomous Helicopter Flight ▪ Key challenges: ▪ Track helicopter position and orientation during flight ▪ Decide on control inputs to send to helicopter Source: http://www.doksinet Autonomous Helicopter Setup On-board inertial measurement unit (IMU) Position Send out controls to helicopter Source: http://www.doksinet HMM for Tracking the Helicopter ▪ State: ▪ Measurements: [observation update] ▪ 3-D coordinates from vision, 3-axis magnetometer, 3-axis gyro, 3-axis accelerometer ▪ Transitions (dynamics): [time elapse update] ▪ st+1 = f (st, at) + wt f: encodes helicopter dynamics, w: noise Source: http://www.doksinet Helicopter MDP ▪ State: ▪ Actions (control inputs): ▪ ▪ ▪ ▪ alon : Main rotor longitudinal cyclic pitch control (affects pitch rate) alat : Main rotor latitudinal cyclic pitch

control (affects roll rate) acoll : Main rotor collective pitch (affects main rotor thrust) arud : Tail rotor collective pitch (affects tail rotor thrust) ▪ Transitions (dynamics): ▪ st+1 = f (st, at) + wt [f encodes helicopter dynamics] [w is a probabilistic noise model] ▪ Can we solve the MDP yet? Source: http://www.doksinet Problem: What’s the Reward? ▪ Reward for hovering: Source: http://www.doksinet Hover [VIDEO: ihover-newencoding09.wmv] [Ng et al, 2004] Source: http://www.doksinet Problem: What’s the Reward? ▪ Rewards for “Flip”? ▪ Problem: what’s the target trajectory? ▪ Just write it down by hand? Source: http://www.doksinet [VIDEO: 20061204---bad.wmv] Flips (?) 38 Source: http://www.doksinet Helicopter Apprenticeship? 39 Source: http://www.doksinet Demonstrations [VIDEO: airshow unaligned.wmv] Source: http://www.doksinet Learning a Trajectory Hidden Demo 1 Demo 2 • HMM-like generative model – Dynamics model used as

HMM transition model – Demos are observations of hidden trajectory • Problem: how do we align observations to hidden trajectory? Abbeel, Coates, Ng, IJRR 2010 Source: http://www.doksinet Probabilistic Alignment using a Bayes’ Net Hidden Demo 1 Demo 2 ▪ Dynamic Time Warping (Needleman&Wunsch 1970, Sakoe&Chiba, 1978) ▪ Extended Kalman filter / smoother Abbeel, Coates, Ng, IJRR 2010 Source: http://www.doksinet Aligned Demonstrations [VIDEO: airshow unaligned.wmv] Source: http://www.doksinet Alignment of Samples ▪ Result: inferred sequence is much cleaner! Source: http://www.doksinet [VIDEO: airshow trimmed.wmv] Final Behavior [Abbeel, Coates, Quigley, Ng, 2010] Source: http://www.doksinet Legged Locomotion Source: http://www.doksinet Quadruped ▪ Low-level control problem: moving a foot into a new location search with successor function ~ moving the motors ▪ High-level control problem: where should we place the feet? ▪ Reward function

R(x) = w . f(s) [25 features] [Kolter, Abbeel & Ng, 2008] Source: http://www.doksinet Experimental setup ▪ Demonstrate path across the “training terrain” ▪ Run apprenticeship to learn the reward function ▪ Receive “testing terrain”---height map. ▪ Find the optimal policy with respect to the learned reward function for crossing the testing terrain. [Kolter, Abbeel & Ng, 2008] Source: http://www.doksinet Without learning [VIDEO: quad initial.wmv] Source: http://www.doksinet With learned reward function [VIDEO: quad initial.wmv] Source: http://www.doksinet Next Time ▪ Final Contest results ▪ Robot butlers ▪ Where to go next to learn more about AI Source: http://www.doksinet CS 188: Artificial Intelligence Conclusion Instructors: Dan Klein and Pieter Abbeel --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at

http://aiberkeleyedu] Source: http://www.doksinet Contest Results Source: http://www.doksinet P1 Mini-Contest Results! Source: http://www.doksinet P1 Mini-Contest Results ▪ 1st place: ▪ Zephyr – Kevin Casey and Zeyu (Andrew) Liu ▪ 2nd place ▪ Artificially.Retarded – Fan Ye and Tianhao Zhang ▪ 3rd place ▪ Tooskoolforkool – Austen Satterlee Source: http://www.doksinet P2 Mini-Contest Results! Source: http://www.doksinet P2 Mini-Contest Results 1 [1815] [1737] 2 [1692] [1676] [1667] 3 [1650] ^ ^ Staffodil EvenOurselvesCanNot Staff Infection Stapphire 89temples.exe Alex Sawai and Zaheen Aziz Staff Fan Ye and TianHao Zhang Staff Staff Joseph Hui and Alan Yao Source: http://www.doksinet P3 Mini-Contest Results! Source: http://www.doksinet P3 Mini-Contest Results 1 [1760] [1747] 2 [1721] 3 [1704] aisaka-taiga Staffodil BOTS Catch me if you can Joseph Hui and Kevin Casey Staff Keonhwa Song and Young Seong Kim Fan Ye and Tianhao Zhang Source:

http://www.doksinet Final Contest Source: http://www.doksinet Final Contest Results! ▪ Challenges: Long term strategy, multiple agents, adversarial utilities, uncertainty about other agents’ positions, plans, etc. Source: http://www.doksinet Final Contest Statistics ▪ 24 teams, thousands of matches! ▪ Naming trends: ▪ Creative names: ▪ best baseline evar ▪ ~ An instructor (Pieter Abbeel) thinks this is an ok bot ~ ▪ Touch My Botty, Put Me On The Floor, Wrestle Me Around, Play With Me Some More ▪ Clear intent names: ▪ Uninstall ▪ placeholder ▪ Great work by everyone! ▪ Final results: now Source: http://www.doksinet Top-10 [1657] [1651] [1616] 4 [1611] 5 [1610] 6 [1594] [1587] 7 [1521] 8 [1496] 9 [1462] 10 [1457] 4v9vaivaehjvaekfaiufecadfdiofudsoiufoisfdoiufsda sfdaiuofsdaoiusfdaoiufsdaouifsa]yfsa]yaevwyuavb uaewbiuaewrbyuieyurfbyufbyrwyuerwahyuoawc hioucfeaouievouiy58avzdkjgfegf3yufoia3w8fgufie aouogifweaugoifeawouiefwabi]uvaoiuvaoiurvrvb

ywevivr]ivrabuivry]roebyivubo Staffy Duck Staff Staffeinated Blitzbot Staff Staffosaurus Rex Staff Begierde des Zauberer Alexander Irpan and Sumeet Jain 4v9vaivaehjvaekfaiufecadfdiofudso James Maa and Kyle Ong werr234 Rohan Chitnis and Harrison Wang Stapphire Staff Several Stupid Squirrels Swallowed Some Superfluous Siracha Sauce--Sean Hickey try Boy Chen haikuBot Sumukh Sridhara [$(1)$] James Wong and Jeffrey Zhang Source: http://www.doksinet Top Five Teams ▪ Top 5: M45H1R0N Team Members: Joseph Hui and Kevin Casey ▪ Top 5: screwdriver Team Members: Fan Ye and Tianhao Zhang ▪ Top 5: Touch My Botty, Put Me On The Floor, Wrestle Me Around, Play With Me Some More Team Members: Oliver He and Alec Spencer ▪ Top 5: Staffodil Team Members: staff ▪ Top 5: Staff Infection Team Members: staff Source: http://www.doksinet For (not) 3rd Place ▪ screwdriver Team Members: Fan Ye and Tianhao Zhang VS ▪ M45H1R0N Team Members: Joseph Hui and Kevin Casey

https://contest.cs188org/contest matches/36114 Source: http://www.doksinet For 1st (and 2nd) place ▪ Touch My Botty Team Members: Oliver He and Alec Spencer VS ▪ M45H1R0N Team Members: Joseph Hui and Kevin Casey https://contest.cs188org/contest matches/35587 Source: http://www.doksinet Vs Staff ▪ Staffodil Team Members: CS188 Staff VS ▪ M45H1R0N ▪ Team Members: Joseph Hui and Kevin Casey https://contest.cs188org/contest matches/34195 Source: http://www.doksinet Top-3 1 [1861] 2 [1809] [1769] [1767] 3 [1675] M45H1R0N Touch My Botty Staffodil Staff Infection screwdriver Joseph Hui and Kevin Casey Oliver He and Alec Spencer Staff Staff Fan Ye and Tianhao Zhang Source: http://www.doksinet Source: http://www.doksinet Ketrina Yim CS188 Artist Source: http://www.doksinet Source: http://www.doksinet Personal Robotics Source: http://www.doksinet PR1 [VIDEO: pr1-montage-short.wmv] [Wyrobek, Berger, van der Loos, Salisbury, 2008] Source:

http://www.doksinet PR2 (autonomous) [VIDEO: 5pile 200x.mp4] [Maitin-Shepard, CusumanoTowner, Lei, Abbeel, 2010] Source: http://www.doksinet Apprenticeship Source: http://www.doksinet Generalizing Trajectories ▪ The problem ▪ Human demonstrated knot-tie in this rope ▪ Robot has to tie a knot in this rope Source: http://www.doksinet Cartoon Problem Setting Train situation: Demonstration: trajectory Test situation: What trajectory here? ? Source: http://www.doksinet Cartoon Problem Setting Demonstration: trajectory Train situation: Test situation: Samples of f : R 3 R3 What trajectory here? ? Source: http://www.doksinet Cartoon Problem Setting Demonstration: trajectory Train situation: Test situation: Samples of f : R 3 R3 What trajectory here? ? Source: http://www.doksinet Cartoon Problem Setting Demonstration: trajectory Train situation: Test situation: Samples of f : R 3 R3 What trajectory here? ? Source: http://www.doksinet Cartoon

Problem Setting Demonstration: trajectory Train situation: Test situation: Samples of f : R 3 R3 What trajectory here? Source: http://www.doksinet Learning f : R3 R3 from samples ▪ Observations ▪ Translations, rotations and scaling are FREE ▪ Can be solved efficiently manipulating matrices of size of number of examples Source: http://www.doksinet Autonomous tying of a knot for previously unseen situations [VIDEO: knots apprentice.mp4] [Schulman, Ho, Lee, Abbeel, 2013] Source: http://www.doksinet Experiment: Suturing [VIDEO: suturing-short-sped-up.mp4] [Schulman, Gupta, Venkatesan, Tayson-Frederick, Abbeel, 2013] Source: http://www.doksinet Looking Ahead Source: http://www.doksinet Pac-Man Beyond the Game! Source: http://www.doksinet Pacman: Beyond Simulation? Students at Colorado University: http://pacman.elstonjcom Source: http://www.doksinet [VIDEO: Roomba Pacman.mp4] Pacman: Beyond Simulation! Source: http://www.doksinet Bugman? ▪ AI =

Animal Intelligence? ▪ Wim van Eck at Leiden University ▪ Pacman controlled by a human ▪ Ghosts controlled by crickets ▪ Vibrations drive crickets toward or away from Pacman’s location http://pong.hkunl/~wim/bugmanhtm Source: http://www.doksinet Bugman [VIDEO: bugman movie 1.mov] Source: http://www.doksinet Where to Go Next? Source: http://www.doksinet Where to go next? ▪ Congratulations, you’ve seen the basics of modern AI ▪ and done some amazing work putting it to use! ▪ How to continue: ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ Machine learning: cs189, stat154 Intro to Data Science: CS194-16 (Franklin) Probability: ee126, stat134 Optimization: ee127 Cognitive modeling: cog sci 131 Machine learning theory: cs281a/b Vision: cs280 Robotics: cs287 NLP: cs288 and more; ask if you’re interested Source: http://www.doksinet That’s It! ▪ Help us out with some course evaluations ▪ Have a great summer, and always maximize your expected utilities!

Source: http://www.doksinet