All posts by dotte

Java concurrency (multi-threading)

Table of Contents
Java concurrency (multi-threading). This article describes how to do concurrent programming with Java. It covers the concepts of parallel programming, immutability, threads, the executor framework (thread pools), futures, callables CompletableFuture and the fork-join framework.

1. Concurrency

1.1. What is concurrency?

Concurrency is the ability to run several programs or several parts of a program in parallel. If a time consuming task can be performed asynchronously or in parallel, this improve the throughput and the interactivity of the program.

A modern computer has several CPU’s or several cores within one CPU. The ability to leverage these multi-cores can be the key for a successful high-volume application.

1.2. Process vs. threads

A process runs independently and isolated of other processes. It cannot directly access shared data in other processes. The resources of the process, e.g. memory and CPU time, are allocated to it via the operating system.

A thread is a so called lightweight process. It has its own call stack, but can access shared data of other threads in the same process. Every thread has its own memory cache. If a thread reads shared data it stores this data in its own memory cache. A thread can re-read the shared data.

A Java application runs by default in one process. Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior.

2. Improvements and issues with concurrency

2.1. Limits of concurrency gains

Within a Java application you work with several threads to achieve parallel processing or asynchronous behavior. Concurrency promises to perform certain task faster as these tasks can be divided into subtasks and these subtasks can be executed in parallel. Of course the runtime is limited by parts of the task which can be performed in parallel.

The theoretical possible performance gain can be calculated by the following rule which is referred to as Amdahl’s Law.

If F is the percentage of the program which can not run in parallel and N is the number of processes, then the maximum performance gain is 1 / (F+ ((1-F)/n)).

2.2. Concurrency issues

Threads have their own call stack, but can also access shared data. Therefore you have two basic problems, visibility and access problems.

A visibility problem occurs if thread A reads shared data which is later changed by thread B and thread A is unaware of this change.

An access problem can occur if several thread access and change the same shared data at the same time.

Visibility and access problem can lead to

  • Liveness failure: The program does not react anymore due to problems in the concurrent access of data, e.g. deadlocks.
  • Safety failure: The program creates incorrect data.

3. Concurrency in Java

3.1. Processes and Threads

A Java program runs in its own process and by default in one thread. Java supports threads as part of the Java language via the Thread code. The Java application can create new threads via this class.

Java 1.5 also provides improved support for concurrency with the in the java.util.concurrent package.

3.2. Locks and thread synchronization

Java provides locks to protect certain parts of the code to be executed by several threads at the same time. The simplest way of locking a certain method or Java class is to define the method or class with the synchronized keyword.

The synchronized keyword in Java ensures:

  • that only a single thread can execute a block of code at the same time
  • that each thread entering a synchronized block of code sees the effects of all previous modifications that were guarded by the same lock

Synchronization is necessary for mutually exclusive access to blocks of and for reliable communication between threads.

You can use the synchronized keyword for the definition of a method. This would ensure that only one thread can enter this method at the same time. Another threads which is calling this method would wait until the first threads leaves this method.

public synchronized void critial() {
    // some thread critical stuff
    // here
}

You can also use the synchronized keyword to protect blocks of code within a method. This block is guarded by a key, which can be either a string or an object. This key is called the lock.

All code which is protected by the same lock can only be executed by one thread at the same time

For example the following datastructure will ensure that only one thread can access the inner block of the add() and next() methods.

package de.vogella.pagerank.crawler;

import java.util.ArrayList;
import java.util.List;

/**
 * Data structure for a web crawler. Keeps track of the visited sites and keeps
 * a list of sites which needs still to be crawled.
 *
 * @author Lars Vogel
 *
 */
public class CrawledSites {
    private List<String> crawledSites = new ArrayList<String>();
    private List<String> linkedSites = new ArrayList<String>();

    public void add(String site) {
        synchronized (this) {
            if (!crawledSites.contains(site)) {
                linkedSites.add(site);
            }
        }
    }

    /**
     * Get next site to crawl. Can return null (if nothing to crawl)
     */
    public String next() {
        if (linkedSites.size() == 0) {
            return null;
        }
        synchronized (this) {
            // Need to check again if size has changed
            if (linkedSites.size() > 0) {
                String s = linkedSites.get(0);
                linkedSites.remove(0);
                crawledSites.add(s);
                return s;
            }
            return null;
        }
    }

}

3.3. Volatile

If a variable is declared with the volatile keyword then it is guaranteed that any thread that reads the field will see the most recently written value. The volatile keyword will not perform any mutual exclusive lock on the variable.

As of Java 5 write access to a volatile variable will also update non-volatile variables which were modified by the same thread. This can also be used to update values within a reference variable, e.g. for a volatile variable person. In this case you must use a temporary variable person and use the setter to initialize the variable and then assign the temporary variable to the final variable. This will then make the address changes of this variable and the values visible to other threads.

4. The Java memory model

4.1. Overview

The Java memory model describes the communication between the memory of the threads and the main memory of the application.

It defines the rules how changes in the memory done by threads are propagated to other threads.

The Java memory model also defines the situations in which a thread re-fresh its own memory from the main memory.

It also describes which operations are atomic and the ordering of the operations.

4.2. Atomic operation

An atomic operation is an operation which is performed as a single unit of work without the possibility of interference from other operations.

The Java language specification guarantees that reading or writing a variable is an atomic operation(unless the variable is of type long or double ). Operations variables of type long or double are only atomic if they declared with the volatile keyword.

Assume i is defined as int. The i++ (increment) operation it not an atomic operation in Java. This also applies for the other numeric types, e.g. long. etc).

The i++ operation first reads the value which is currently stored in i (atomic operations) and then it adds one to it (atomic operation). But between the read and the write the value of i might have changed.

Since Java 1.5 the java language provides atomic variables, e.g. AtomicInteger or AtomicLong which provide methods like getAndDecrement(), getAndIncrement() and getAndSet() which are atomic.

4.3. Memory updates in synchronized code

The Java memory model guarantees that each thread entering a synchronized block of code sees the effects of all previous modifications that were guarded by the same lock.

5. Immutability and Defensive Copies

5.1. Immutability

The simplest way to avoid problems with concurrency is to share only immutable data between threads. Immutable data is data which cannot changed.

To make a class immutable make

  • all its fields final
  • the class declared as final
  • the this reference is not allowed to escape during construction
  • Any fields which refer to mutable data objects are
  • private
  • have no setter method
  • they are never directly returned of otherwise exposed to a caller
  • if they are changed internally in the class this change is not visible and has no effect outside of the class

An immutable class may have some mutable data which is uses to manages its state but from the outside this class nor any attribute of this class can get changed.

For all mutable fields, e.g. Arrays, that are passed from the outside to the class during the construction phase, the class needs to make a defensive-copy of the elements to make sure that no other object from the outside still can change the data

5.2. Defensive Copies

You must protect your classes from calling code. Assume that calling code will do its best to change your data in a way you didn’t expect it. While this is especially true in case of immutable data it is also true for non-immutable data which you still not expect that this data is changed outside your class.

To protect your class against that you should copy data you receive and only return copies of data to calling code.

The following example creates a copy of a list (ArrayList) and returns only the copy of the list. This way the client of this class cannot remove elements from the list.

package de.vogella.performance.defensivecopy;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class MyDataStructure {
    List<String> list = new ArrayList<String>();

    public void add(String s) {
        list.add(s);
    }

    /**
     * Makes a defensive copy of the List and return it
     * This way cannot modify the list itself
     *
     * @return List<String>
     */
    public List<String> getList() {
        return Collections.unmodifiableList(list);
    }
}

6. Threads in Java

The base means for concurrency are is the java.lang.Threads class. A Thread executes an object of type java.lang.Runnable.

Runnable is an interface with defines the run() method. This method is called by the Thread object and contains the work which should be done. Therefore the “Runnable” is the task to perform. The Thread is the worker who is doing this task.

The following demonstrates a task (Runnable) which counts the sum of a given range of numbers. Create a new Java project called de.vogella.concurrency.threads for the example code of this section.

package de.vogella.concurrency.threads;

/**
 * MyRunnable will count the sum of the number from 1 to the parameter
 * countUntil and then write the result to the console.
 * <p>
 * MyRunnable is the task which will be performed
 *
 * @author Lars Vogel
 *
 */
public class MyRunnable implements Runnable {
    private final long countUntil;

    MyRunnable(long countUntil) {
        this.countUntil = countUntil;
    }

    @Override
    public void run() {
        long sum = 0;
        for (long i = 1; i < countUntil; i++) {
            sum += i;
        }
        System.out.println(sum);
    }
}

The following example demonstrate the usage of the Thread and the Runnable class.

package de.vogella.concurrency.threads;

import java.util.ArrayList;
import java.util.List;

public class Main {

    public static void main(String[] args) {
        // We will store the threads so that we can check if they are done
        List<Thread> threads = new ArrayList<Thread>();
        // We will create 500 threads
        for (int i = 0; i < 500; i++) {
            Runnable task = new MyRunnable(10000000L + i);
            Thread worker = new Thread(task);
            // We can set the name of the thread
            worker.setName(String.valueOf(i));
            // Start the thread, never call method run() direct
            worker.start();
            // Remember the thread for later usage
            threads.add(worker);
        }
        int running = 0;
        do {
            running = 0;
            for (Thread thread : threads) {
                if (thread.isAlive()) {
                    running++;
                }
            }
            System.out.println("We have " + running + " running threads. ");
        } while (running > 0);

    }
}

Using the Thread class directly has the following disadvantages.

  • Creating a new thread causes some performance overhead.
  • Too many threads can lead to reduced performance, as the CPU needs to switch between these threads.
  • You cannot easily control the number of threads, therefore you may run into out of memory errors due to too many threads.

The java.util.concurrent package offers improved support for concurrency compared to the direct usage of Threads. This package is described in the next section.

7. Threads pools with the Executor Framework

You find this examples in the source section in Java project called de.vogella.concurrency.threadpools.

Thread pools manage a pool of worker threads. The thread pools contains a work queue which holds tasks waiting to get executed.

A thread pool can be described as a collection of Runnable objects.

(work queue) and a connections of running threads. These threads are constantly running and are checking the work query for new work. If there is new work to be done they execute this Runnable. The Thread class itself provides a method, e.g. execute(Runnable r) to add a new Runnable object to the work queue.

The Executor framework provides example implementation of the java.util.concurrent.Executor interface, e.g. Executors.newFixedThreadPool(int n) which will create n worker threads. The ExecutorService adds life cycle methods to the Executor, which allows to shutdown the Executor and to wait for termination.

If you want to use one thread pool with one thread which executes several runnables you can use the Executors.newSingleThreadExecutor() method.

Create again the Runnable.

package de.vogella.concurrency.threadpools;

/**
 * MyRunnable will count the sum of the number from 1 to the parameter
 * countUntil and then write the result to the console.
 * <p>
 * MyRunnable is the task which will be performed
 *
 * @author Lars Vogel
 *
 */
public class MyRunnable implements Runnable {
    private final long countUntil;

    MyRunnable(long countUntil) {
        this.countUntil = countUntil;
    }

    @Override
    public void run() {
        long sum = 0;
        for (long i = 1; i < countUntil; i++) {
            sum += i;
        }
        System.out.println(sum);
    }
}

Now you run your runnables with the executor framework.

package de.vogella.concurrency.threadpools;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class Main {
    private static final int NTHREDS = 10;

    public static void main(String[] args) {
        ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
        for (int i = 0; i < 500; i++) {
            Runnable worker = new MyRunnable(10000000L + i);
            executor.execute(worker);
        }
        // This will make the executor accept no new threads
        // and finish all existing threads in the queue
        executor.shutdown();
        // Wait until all threads are finish
        executor.awaitTermination();
        System.out.println("Finished all threads");
    }
}

In case the threads should return some value (result-bearing threads) then you can use the java.util.concurrent.Callable class.

8. Futures and Callables

8.1. Futures and Callables

The executor framework presented in the last chapter uses Runnable objects. Unfortunately a Runnable cannot return a result to the caller.

In case you expect your threads to return a computed result you can use java.util.concurrent.Callable. The Callable object allows to return values after completion.

The Callable object uses generics to define the type of object which is returned.

If you submit a Callable object to an Executor, the framework returns an object of type java.util.concurrent.Future. Future exposes methods allowing a client to monitor the progress of a task being executed by a different thread. Therefore, a Future object can be used to check the status of a Callable. It can also be used to retrieve the result from the Callable.

On the Executor you can use the method submit to submit a Callable and to get a future. To retrieve the result of the future use the get() method.

package de.vogella.concurrency.callables;

import java.util.concurrent.Callable;

public class MyCallable implements Callable<Long> {
    @Override
    public Long call() throws Exception {
        long sum = 0;
        for (long i = 0; i <= 100; i++) {
            sum += i;
        }
        return sum;
    }
}
package de.vogella.concurrency.callables;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class CallableFutures {
    private static final int NTHREDS = 10;

    public static void main(String[] args) {

        ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
        List<Future<Long>> list = new ArrayList<Future<Long>>();
        for (int i = 0; i < 20000; i++) {
            Callable<Long> worker = new MyCallable();
            Future<Long> submit = executor.submit(worker);
            list.add(submit);
        }
        long sum = 0;
        System.out.println(list.size());
        // now retrieve the result
        for (Future<Long> future : list) {
            try {
                sum += future.get();
            } catch (InterruptedException e) {
                e.printStackTrace();
            } catch (ExecutionException e) {
                e.printStackTrace();
            }
        }
        System.out.println(sum);
        executor.shutdown();
    }
}

8.2. Drawbacks with Futures and Callables

The Future interface is limited as a model of asynchronously executed tasks. Future allows a client to query a Callable task for its result. It does not provide the option to register a callback method. A callback method would allow you to get a callback once a task is done. In Java 5 you could use ExecutorCompletionService for this purpose but as of Java 8 you can use the CompletableFuture interface which allows to provide a callback interface which is called once a task is completed.

9. CompletableFuture

Asynchronous task handling is important for any application which performs time consuming activities, as IO operations. Two basic approaches to asynchronous task handling are available to a Java application:

  • application logic blocks until a task completes
  • application logic is called once the task completes, this is called a nonblocking approach.

CompletableFuture extends the functionality of the Future interface for asynchronous calls. It also implements the CompletionStage interface. CompletionStage offers methods, that let you attach callbacks that will be executed on completion.

It adds standard techniques for executing application code when a task completes, including various ways to combine tasks. CompletableFuture support both blocking and nonblocking approaches, including regular callbacks.

This callback can be executed in another thread as the thread in which the CompletableFuture is executed.

The following example demonstrates how to create a basic CompletableFuture.

CompletableFuture.supplyAsync(this::doSomething);

CompletableFuture.supplyAsync runs the task asynchronously on the default thread pool of Java. It has the option to supply your custom executor to define the ThreadPool.

package snippet;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;

public class CompletableFutureSimpleSnippet {
    public static void main(String[] args) {
        long started = System.currentTimeMillis();

        // configure CompletableFuture
        CompletableFuture<Integer> futureCount = createCompletableFuture();

            // continue to do other work
            System.out.println("Took " + (started - System.currentTimeMillis()) + " milliseconds" );

            // now its time to get the result
            try {
              int count = futureCount.get();
                System.out.println("CompletableFuture took " + (started - System.currentTimeMillis()) + " milliseconds" );

               System.out.println("Result " + count);
             } catch (InterruptedException | ExecutionException ex) {
                // Exceptions from the future should be handled here
            }
    }

    private static CompletableFuture<Integer> createCompletableFuture() {
        CompletableFuture<Integer> futureCount = CompletableFuture.supplyAsync(
                () -> {
                    try {
                        // simulate long running task
                        Thread.sleep(5000);
                    } catch (InterruptedException e) { }
                    return 20;
                });
        return futureCount;
    }

}

The usage of the thenApply method is demonstrated by the following code snippet.

package snippet;

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;

public class CompletableFutureCallback {
    public static void main(String[] args) {
        long started = System.currentTimeMillis();

        CompletableFuture<String>  data = createCompletableFuture()
                .thenApply((Integer count) -> {
                    int transformedValue = count * 10;
                    return transformedValue;
                }).thenApply(transformed -> "Finally creates a string: " + transformed);

            try {
                System.out.println(data.get());
            } catch (InterruptedException | ExecutionException e) {

            }
    }

    public static CompletableFuture<Integer> createCompletableFuture() {
        CompletableFuture<Integer>  result = CompletableFuture.supplyAsync(() -> {
            try {
                // simulate long running task
                Thread.sleep(5000);
            } catch (InterruptedException e) { }
            return 20;
        });
        return result;
    }

}

10. Nonblocking algorithms

Java 5.0 provides supports for additional atomic operations. This allows to develop algorithm which are non-blocking algorithm, e.g. which do not require synchronization, but are based on low-level atomic hardware primitives such as compare-and-swap (CAS). A compare-and-swap operation check if the variable has a certain value and if it has this value it will perform this operation.

Non-blocking algorithms are typically faster than blocking algorithms, as the synchronization of threads appears on a much finer level (hardware).

For example this created a non-blocking counter which always increases. This example is contained in the project called de.vogella.concurrency.nonblocking.counter.

package de.vogella.concurrency.nonblocking.counter;

import java.util.concurrent.atomic.AtomicInteger;

public class Counter {
    private AtomicInteger value = new AtomicInteger();
    public int getValue(){
        return value.get();
    }
    public int increment(){
        return value.incrementAndGet();
    }

    // Alternative implementation as increment but just make the
    // implementation explicit
    public int incrementLongVersion(){
        int oldValue = value.get();
        while (!value.compareAndSet(oldValue, oldValue+1)){
             oldValue = value.get();
        }
        return oldValue+1;
    }

}

And a test.

package de.vogella.concurrency.nonblocking.counter;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class Test {
        private static final int NTHREDS = 10;

        public static void main(String[] args) {
            final Counter counter = new Counter();
            List<Future<Integer>> list = new ArrayList<Future<Integer>>();

            ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
            for (int i = 0; i < 500; i++) {
                Callable<Integer> worker = new  Callable<Integer>() {
                    @Override
                    public Integer call() throws Exception {
                        int number = counter.increment();
                        System.out.println(number );
                        return number ;
                    }
                };
                Future<Integer> submit= executor.submit(worker);
                list.add(submit);

            }


            // This will make the executor accept no new threads
            // and finish all existing threads in the queue
            executor.shutdown();
            // Wait until all threads are finish
            while (!executor.isTerminated()) {
            }
            Set<Integer> set = new HashSet<Integer>();
            for (Future<Integer> future : list) {
                try {
                    set.add(future.get());
                } catch (InterruptedException e) {
                    e.printStackTrace();
                } catch (ExecutionException e) {
                    e.printStackTrace();
                }
            }
            if (list.size()!=set.size()){
                throw new RuntimeException("Double-entries!!!");
            }

        }


}

The interesting part is how the incrementAndGet() method is implemented. It uses a CAS operation.

public final int incrementAndGet() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if (compareAndSet(current, next))
                return next;
        }
    }

The JDK itself makes more and more use of non-blocking algorithms to increase performance for every developer. Developing correct non-blocking algorithm is not a trivial task.

For more information on non-blocking algorithm, e.g. examples for a non-blocking Stack and non-block LinkedList, please see http://www.ibm.com/developerworks/java/library/j-jtp04186/index.html

11. Fork-Join in Java 7

Java 7 introduce a new parallel mechanism for compute intensive tasks, the fork-join framework. The fork-join framework allows you to distribute a certain task on several workers and then wait for the result.

For Java 6.0 you can download the package (jsr166y) from the Download site.

For testing create the Java project “de.vogella.performance.forkjoin”. If you are not using Java 7 you also need to jsr166y.jar to the classpath.

Create first a algorithm package and then the following class.

package algorithm;

import java.util.Random;

/**
 *
 * This class defines a long list of integers which defines the problem we will
 * later try to solve
 *
 */
public class Problem {
    private final int[] list = new int[2000000];

    public Problem() {
        Random generator = new Random(19580427);
        for (int i = 0; i < list.length; i++) {
            list[i] = generator.nextInt(500000);
        }
    }

    public int[] getList() {
        return list;
    }

}

Define now the Solver class as shown in the following example coding.

The API defines other top classes, e.g. RecursiveAction, AsyncAction. Check the Javadoc for details.
package algorithm;

import java.util.Arrays;

import jsr166y.forkjoin.RecursiveAction;

public class Solver extends RecursiveAction {
    private int[] list;
    public long result;

    public Solver(int[] array) {
        this.list = array;
    }

    @Override
    protected void compute() {
        if (list.length == 1) {
            result = list[0];
        } else {
            int midpoint = list.length / 2;
            int[] l1 = Arrays.copyOfRange(list, 0, midpoint);
            int[] l2 = Arrays.copyOfRange(list, midpoint, list.length);
            Solver s1 = new Solver(l1);
            Solver s2 = new Solver(l2);
            forkJoin(s1, s2);
            result = s1.result + s2.result;
        }
    }
}

Now define a small test class for testing it efficiency.

package testing;

import jsr166y.forkjoin.ForkJoinExecutor;
import jsr166y.forkjoin.ForkJoinPool;
import algorithm.Problem;
import algorithm.Solver;

public class Test {

    public static void main(String[] args) {
        Problem test = new Problem();
        // check the number of available processors
        int nThreads = Runtime.getRuntime().availableProcessors();
        System.out.println(nThreads);
        Solver mfj = new Solver(test.getList());
        ForkJoinExecutor pool = new ForkJoinPool(nThreads);
        pool.invoke(mfj);
        long result = mfj.getResult();
        System.out.println("Done. Result: " + result);
        long sum = 0;
        // check if the result was ok
        for (int i = 0; i < test.getList().length; i++) {
            sum += test.getList()[i];
        }
        System.out.println("Done. Result: " + sum);
    }
}

12. Deadlock

A concurrent application has the risk of a deadlock. A set of processes are deadlocked if all processes are waiting for an event which another process in the same set has to cause.

For example if thread A waits for a lock on object Z which thread B holds and thread B wait for a look on object Y which is hold be process A then these two processes are locked and cannot continue in their processing.

This can be compared to a traffic jam, where cars(threads) require the access to a certain street(resource), which is currently blocked by another car(lock).

Deadlock

Copyright © 2012-2017 vogella GmbH. Free use of the software examples is granted under the terms of the EPL License. This tutorial is published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Germany license.

See Licence.

from:http://www.vogella.com/tutorials/JavaConcurrency/article.html

正确使用 Volatile 变量

Java 语言中的 volatile 变量可以被看作是一种 “程度较轻的 synchronized”;与 synchronized 块相比,volatile 变量所需的编码较少,并且运行时开销也较少,但是它所能实现的功能也仅是 synchronized 的一部分。本文介绍了几种有效使用 volatile 变量的模式,并强调了几种不适合使用 volatile 变量的情形。

锁提供了两种主要特性:互斥(mutual exclusion)可见性(visibility)。互斥即一次只允许一个线程持有某个特定的锁,因此可使用该特性实现对共享数据的协调访问协议,这样,一次就只有一个线程能够使用该共享数据。可见性要更加复杂一些,它必须确保释放锁之前对共享数据做出的更改对于随后获得该锁的另一个线程是可见的 —— 如果没有同步机制提供的这种可见性保证,线程看到的共享变量可能是修改前的值或不一致的值,这将引发许多严重问题。

Volatile 变量

Volatile 变量具有 synchronized 的可见性特性,但是不具备原子特性。这就是说线程能够自动发现 volatile 变量的最新值。Volatile 变量可用于提供线程安全,但是只能应用于非常有限的一组用例:多个变量之间或者某个变量的当前值与修改后值之间没有约束。因此,单独使用 volatile 还不足以实现计数器、互斥锁或任何具有与多个变量相关的不变式(Invariants)的类(例如 “start <=end”)。

出于简易性或可伸缩性的考虑,您可能倾向于使用 volatile 变量而不是锁。当使用 volatile 变量而非锁时,某些习惯用法(idiom)更加易于编码和阅读。此外,volatile 变量不会像锁那样造成线程阻塞,因此也很少造成可伸缩性问题。在某些情况下,如果读操作远远大于写操作,volatile 变量还可以提供优于锁的性能优势。

正确使用 volatile 变量的条件

您只能在有限的一些情形下使用 volatile 变量替代锁。要使 volatile 变量提供理想的线程安全,必须同时满足下面两个条件:

  • 对变量的写操作不依赖于当前值。
  • 该变量没有包含在具有其他变量的不变式中。

实际上,这些条件表明,可以被写入 volatile 变量的这些有效值独立于任何程序的状态,包括变量的当前状态。

第一个条件的限制使 volatile 变量不能用作线程安全计数器。虽然增量操作(x++)看上去类似一个单独操作,实际上它是一个由读取-修改-写入操作序列组成的组合操作,必须以原子方式执行,而 volatile 不能提供必须的原子特性。实现正确的操作需要使 x 的值在操作期间保持不变,而 volatile 变量无法实现这点。(然而,如果将值调整为只从单个线程写入,那么可以忽略第一个条件。)

大多数编程情形都会与这两个条件的其中之一冲突,使得 volatile 变量不能像 synchronized 那样普遍适用于实现线程安全。清单 1 显示了一个非线程安全的数值范围类。它包含了一个不变式 —— 下界总是小于或等于上界。

清单 1. 非线程安全的数值范围类
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@NotThreadSafe
public class NumberRange {
    private int lower, upper;
    public int getLower() { return lower; }
    public int getUpper() { return upper; }
    public void setLower(int value) {
        if (value > upper)
            throw new IllegalArgumentException(...);
        lower = value;
    }
    public void setUpper(int value) {
        if (value < lower)
            throw new IllegalArgumentException(...);
        upper = value;
    }
}

这种方式限制了范围的状态变量,因此将 lower 和 upper 字段定义为 volatile 类型不能够充分实现类的线程安全;从而仍然需要使用同步。否则,如果凑巧两个线程在同一时间使用不一致的值执行 setLowersetUpper 的话,则会使范围处于不一致的状态。例如,如果初始状态是 (0, 5),同一时间内,线程 A 调用 setLower(4) 并且线程 B 调用 setUpper(3),显然这两个操作交叉存入的值是不符合条件的,那么两个线程都会通过用于保护不变式的检查,使得最后的范围值是 (4, 3) —— 一个无效值。至于针对范围的其他操作,我们需要使 setLower()setUpper() 操作原子化 —— 而将字段定义为 volatile 类型是无法实现这一目的的。

性能考虑

使用 volatile 变量的主要原因是其简易性:在某些情形下,使用 volatile 变量要比使用相应的锁简单得多。使用 volatile 变量次要原因是其性能:某些情况下,volatile 变量同步机制的性能要优于锁。

很难做出准确、全面的评价,例如 “X 总是比 Y 快”,尤其是对 JVM 内在的操作而言。(例如,某些情况下 VM 也许能够完全删除锁机制,这使得我们难以抽象地比较 volatilesynchronized 的开销。)就是说,在目前大多数的处理器架构上,volatile 读操作开销非常低 —— 几乎和非 volatile 读操作一样。而 volatile 写操作的开销要比非 volatile 写操作多很多,因为要保证可见性需要实现内存界定(Memory Fence),即便如此,volatile 的总开销仍然要比锁获取低。

volatile 操作不会像锁一样造成阻塞,因此,在能够安全使用 volatile 的情况下,volatile 可以提供一些优于锁的可伸缩特性。如果读操作的次数要远远超过写操作,与锁相比,volatile 变量通常能够减少同步的性能开销。

正确使用 volatile 的模式

很多并发性专家事实上往往引导用户远离 volatile 变量,因为使用它们要比使用锁更加容易出错。然而,如果谨慎地遵循一些良好定义的模式,就能够在很多场合内安全地使用 volatile 变量。要始终牢记使用 volatile 的限制 —— 只有在状态真正独立于程序内其他内容时才能使用 volatile —— 这条规则能够避免将这些模式扩展到不安全的用例。

模式 #1:状态标志

也许实现 volatile 变量的规范使用仅仅是使用一个布尔状态标志,用于指示发生了一个重要的一次性事件,例如完成初始化或请求停机。

很多应用程序包含了一种控制结构,形式为 “在还没有准备好停止程序时再执行一些工作”,如清单 2 所示:

清单 2. 将 volatile 变量作为状态标志使用
1
2
3
4
5
6
7
8
9
10
11
volatile boolean shutdownRequested;
...
public void shutdown() { shutdownRequested = true; }
public void doWork() {
    while (!shutdownRequested) {
        // do stuff
    }
}

很可能会从循环外部调用 shutdown() 方法 —— 即在另一个线程中 —— 因此,需要执行某种同步来确保正确实现 shutdownRequested 变量的可见性。(可能会从 JMX 侦听程序、GUI 事件线程中的操作侦听程序、通过 RMI 、通过一个 Web 服务等调用)。然而,使用 synchronized 块编写循环要比使用清单 2 所示的 volatile 状态标志编写麻烦很多。由于 volatile 简化了编码,并且状态标志并不依赖于程序内任何其他状态,因此此处非常适合使用 volatile。

这种类型的状态标记的一个公共特性是:通常只有一种状态转换;shutdownRequested 标志从 false 转换为 true,然后程序停止。这种模式可以扩展到来回转换的状态标志,但是只有在转换周期不被察觉的情况下才能扩展(从 falsetrue,再转换到 false)。此外,还需要某些原子状态转换机制,例如原子变量。

模式 #2:一次性安全发布(one-time safe publication)

缺乏同步会导致无法实现可见性,这使得确定何时写入对象引用而不是原语值变得更加困难。在缺乏同步的情况下,可能会遇到某个对象引用的更新值(由另一个线程写入)和该对象状态的旧值同时存在。(这就是造成著名的双重检查锁定(double-checked-locking)问题的根源,其中对象引用在没有同步的情况下进行读操作,产生的问题是您可能会看到一个更新的引用,但是仍然会通过该引用看到不完全构造的对象)。

实现安全发布对象的一种技术就是将对象引用定义为 volatile 类型。清单 3 展示了一个示例,其中后台线程在启动阶段从数据库加载一些数据。其他代码在能够利用这些数据时,在使用之前将检查这些数据是否曾经发布过。

清单 3. 将 volatile 变量用于一次性安全发布
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public class BackgroundFloobleLoader {
    public volatile Flooble theFlooble;
    public void initInBackground() {
        // do lots of stuff
        theFlooble = new Flooble();  // this is the only write to theFlooble
    }
}
public class SomeOtherClass {
    public void doWork() {
        while (true) {
            // do some stuff...
            // use the Flooble, but only if it is ready
            if (floobleLoader.theFlooble != null)
                doSomething(floobleLoader.theFlooble);
        }
    }
}

如果 theFlooble 引用不是 volatile 类型,doWork() 中的代码在解除对 theFlooble 的引用时,将会得到一个不完全构造的 Flooble

该模式的一个必要条件是:被发布的对象必须是线程安全的,或者是有效的不可变对象(有效不可变意味着对象的状态在发布之后永远不会被修改)。volatile 类型的引用可以确保对象的发布形式的可见性,但是如果对象的状态在发布后将发生更改,那么就需要额外的同步。

模式 #3:独立观察(independent observation)

安全使用 volatile 的另一种简单模式是:定期 “发布” 观察结果供程序内部使用。例如,假设有一种环境传感器能够感觉环境温度。一个后台线程可能会每隔几秒读取一次该传感器,并更新包含当前文档的 volatile 变量。然后,其他线程可以读取这个变量,从而随时能够看到最新的温度值。

使用该模式的另一种应用程序就是收集程序的统计信息。清单 4 展示了身份验证机制如何记忆最近一次登录的用户的名字。将反复使用 lastUser 引用来发布值,以供程序的其他部分使用。

清单 4. 将 volatile 变量用于多个独立观察结果的发布
1
2
3
4
5
6
7
8
9
10
11
12
13
public class UserManager {
    public volatile String lastUser;
    public boolean authenticate(String user, String password) {
        boolean valid = passwordIsValid(user, password);
        if (valid) {
            User u = new User();
            activeUsers.add(u);
            lastUser = user;
        }
        return valid;
    }
}

该模式是前面模式的扩展;将某个值发布以在程序内的其他地方使用,但是与一次性事件的发布不同,这是一系列独立事件。这个模式要求被发布的值是有效不可变的 —— 即值的状态在发布后不会更改。使用该值的代码需要清楚该值可能随时发生变化。

模式 #4:“volatile bean” 模式

volatile bean 模式适用于将 JavaBeans 作为“荣誉结构”使用的框架。在 volatile bean 模式中,JavaBean 被用作一组具有 getter 和/或 setter 方法 的独立属性的容器。volatile bean 模式的基本原理是:很多框架为易变数据的持有者(例如 HttpSession)提供了容器,但是放入这些容器中的对象必须是线程安全的。

在 volatile bean 模式中,JavaBean 的所有数据成员都是 volatile 类型的,并且 getter 和 setter 方法必须非常普通 —— 除了获取或设置相应的属性外,不能包含任何逻辑。此外,对于对象引用的数据成员,引用的对象必须是有效不可变的。(这将禁止具有数组值的属性,因为当数组引用被声明为 volatile 时,只有引用而不是数组本身具有 volatile 语义)。对于任何 volatile 变量,不变式或约束都不能包含 JavaBean 属性。清单 5 中的示例展示了遵守 volatile bean 模式的 JavaBean:

清单 5. 遵守 volatile bean 模式的 Person 对象
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
@ThreadSafe
public class Person {
    private volatile String firstName;
    private volatile String lastName;
    private volatile int age;
    public String getFirstName() { return firstName; }
    public String getLastName() { return lastName; }
    public int getAge() { return age; }
    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
    public void setAge(int age) {
        this.age = age;
    }
}

volatile 的高级模式

前面几节介绍的模式涵盖了大部分的基本用例,在这些模式中使用 volatile 非常有用并且简单。这一节将介绍一种更加高级的模式,在该模式中,volatile 将提供性能或可伸缩性优势。

volatile 应用的的高级模式非常脆弱。因此,必须对假设的条件仔细证明,并且这些模式被严格地封装了起来,因为即使非常小的更改也会损坏您的代码!同样,使用更高级的 volatile 用例的原因是它能够提升性能,确保在开始应用高级模式之前,真正确定需要实现这种性能获益。需要对这些模式进行权衡,放弃可读性或可维护性来换取可能的性能收益 —— 如果您不需要提升性能(或者不能够通过一个严格的测试程序证明您需要它),那么这很可能是一次糟糕的交易,因为您很可能会得不偿失,换来的东西要比放弃的东西价值更低。

模式 #5:开销较低的读-写锁策略

目前为止,您应该了解了 volatile 的功能还不足以实现计数器。因为 ++x 实际上是三种操作(读、添加、存储)的简单组合,如果多个线程凑巧试图同时对 volatile 计数器执行增量操作,那么它的更新值有可能会丢失。

然而,如果读操作远远超过写操作,您可以结合使用内部锁和 volatile 变量来减少公共代码路径的开销。清单 6 中显示的线程安全的计数器使用 synchronized 确保增量操作是原子的,并使用 volatile 保证当前结果的可见性。如果更新不频繁的话,该方法可实现更好的性能,因为读路径的开销仅仅涉及 volatile 读操作,这通常要优于一个无竞争的锁获取的开销。

清单 6. 结合使用 volatile 和 synchronized 实现 “开销较低的读-写锁”
1
2
3
4
5
6
7
8
9
10
11
12
@ThreadSafe
public class CheesyCounter {
    // Employs the cheap read-write lock trick
    // All mutative operations MUST be done with the 'this' lock held
    @GuardedBy("this") private volatile int value;
    public int getValue() { return value; }
    public synchronized int increment() {
        return value++;
    }
}

之所以将这种技术称之为 “开销较低的读-写锁” 是因为您使用了不同的同步机制进行读写操作。因为本例中的写操作违反了使用 volatile 的第一个条件,因此不能使用 volatile 安全地实现计数器 —— 您必须使用锁。然而,您可以在读操作中使用 volatile 确保当前值的可见性,因此可以使用锁进行所有变化的操作,使用 volatile 进行只读操作。其中,锁一次只允许一个线程访问值,volatile 允许多个线程执行读操作,因此当使用 volatile 保证读代码路径时,要比使用锁执行全部代码路径获得更高的共享度 —— 就像读-写操作一样。然而,要随时牢记这种模式的弱点:如果超越了该模式的最基本应用,结合这两个竞争的同步机制将变得非常困难。

结束语

与锁相比,Volatile 变量是一种非常简单但同时又非常脆弱的同步机制,它在某些情况下将提供优于锁的性能和伸缩性。如果严格遵循 volatile 的使用条件 —— 即变量真正独立于其他变量和自己以前的值 —— 在某些情况下可以使用 volatile 代替 synchronized 来简化代码。然而,使用 volatile 的代码往往比使用锁的代码更加容易出错。本文介绍的模式涵盖了可以使用 volatile 代替 synchronized 的最常见的一些用例。遵循这些模式(注意使用时不要超过各自的限制)可以帮助您安全地实现大多数用例,使用 volatile 变量获得更佳性能。


相关主题

  • 您可以参阅本文在 developerWorks 全球站点上的 英文原文
  • Java Concurrency in Practice:使用 Java 代码开发并发程序的 how-to 手册,内容包括构建并编写线程安全的类和程序、避免性能影响、管理性能和测试并发应用程序。
  • 流行的原子:介绍了 Java 5.0 中新增的原子变量类,该特性对 volatile 变量进行了扩展,从而支持原子状态转换。
  • 非阻塞算法简介:介绍如何使用原子变量而不是锁实现并发算法。
  • Volatiles:从 Wikipedia 获得关于 volatile 变量的更多信息。
  • Java 技术专区:提供了数百篇有关 Java 编程各个方面的文章。

from:https://www.ibm.com/developerworks/cn/java/j-jtp06197.html

我是如何学习区块链的

原创文章,转载请注明:转载自Keegan小钢
并标明原文链接:http://keeganlee.me/post/full-stack/20170915
微信订阅号:keeganlee_me
写于2017-09-15


专栏地址:https://xiaozhuanlan.com/fullstack


前几天我们已经学了如何学习的“道”和“术”,学完之后就应该落地到实践上,通过不断地实践练习,才能将这些 知识资源 转化为我们的 知识资本。如果你看完前面的文章后,觉得讲得真好,然后缺乏思考缺乏行动,然后就没有然后了。为了更好地指导你们如何实践,本篇文章我将与你分享我是如何将知识资源转化为我的知识资本的。

为什么选择区块链

选择区块链作为实践学习的案例,原因有三:

  • 第一,区块链是我最近两三个月刚学习的领域,对我来说也是一门从零开始学习的领域,这样的学习案例最具有指导作用。因为时间没有隔太久,很多学习过程中的细节我也还记得,这样我可以还原出更真实的学习过程。
  • 第二,区块链——确切说是虚拟货币最近非常火,连中国大妈都知道了,我的微信群里也有部分人在玩或准备玩,但大部分人普遍对比特币和区块链缺乏正确的认知,只是跟随潮流,这其实是存在很大风险的。因此,我觉得也有必要普及一下区块链和比特币正确的认知。
  • 第三,区块链是价值互联网的基石,是未来必然的趋势,会成为像HTTP一样基础的技术,所以也应该是每个技术人员都要掌握的技术。

关于第二点,很多人是因为听到了太多虚拟货币的致富神话,所以才开始关注这个行业。但却因为对虚拟货币和区块链缺乏真正的了解,所以其实看不懂这个行业的现状,更看不懂这个行业的未来。因为看不懂,所以要么一直观望,要么畏畏缩缩。最典型的就是我的一个同事,看着别人赚钱,一直想进。但从比特币1万块观望到2万块,一直不敢入场,总觉得价格太高,但却一直看着它升高。后来,有一次,被他抓到了一次抄底的机会,就是7月中旬比特币跌到1万3后反弹到差不多两万的那次,他终于下了决心入场了,貌似是丢了2000块钱进去,结果呢,赚了200块钱他就抛了。后来,BCC从2000块左右开始起飞之前,也被他抓住了,这次他胆子大了,丢了10000块钱进去,但和上次一样,只赚了10%就怕了跑出来了。那次BCC起飞足足翻了一倍还多。不过他也算是幸运的,起码赚到了钱,虽然赚得少。但有很多人是亏钱了的,而且还亏得不少。说了这么多,其实就是想表明,你想在这个行业里赚钱,甚至长期赚钱,你看不懂是不行的。

第三点才是最重要的,区块链将会成为基础性技术。基于HTTP的互联网可以称为信息互联网,主要传递的就是各种信息。而基于区块链的互联网则称为价值互联网,传递的是价值。在信息互联网时代你需要熟悉HTTP,那么,在价值互联网时代你就需要熟悉区块链。而且,未来已来,我们应该提前做好准备。

开始学习之前

每个人开始学习某项领域的知识之前,或多或少都会先听到或读到各种概念和观点,没有人会在真正一无所知的情况下突然决定要去学习某个东西。就比如说,如果你连“区块链”这个概念都没听说过,那你就不可能会有想去学“区块链”的想法。

我在决定开始学习区块链之前,就已经听到和读到了很多相关的概念,包括:区块链1.0、区块链2.0、区块链3.0、公有链、联盟链、私有链、硬分叉、软分叉、隔离见证、闪电网络、雷电网络、以太坊、以太坊经典、币圈、链圈、智能合约、比特币钱包等等,一大堆陌生概念。不过,那时候,给我印象最深刻的还是比特币,一个星期就从1万涨到了1万5。

那时候了解到的信息大部分都是从巴比特读到的,很多文章其实都看不懂,主要还是不懂的概念太多。但有一些还是看懂了,比如,比特币可称为数字黄金,那它未来的价值应该可以接近黄金,从这个角度来看的话,那比特币还有很大的上涨空间。比如,区块链开启了价值互联网时代,是趋势所在。主要也是这两点激发起了我学习区块链的兴趣。

确定目标

《001 | 如何高效学习》一文中就已经说过,想要提高学习效率,第一个条件就是:目标导向。功利学习法的核心也是目标导向。因此,我们学习一个东西,第一步必须是确定目标,而且必须是清晰明确的目标。另外,目标有分大目标和小目标。一开始,你是先确定了大目标,然后根据大目标分解成一个个小目标,每个小目标都应该是非常清晰明确可操作的。下面我就讲解我是如何将大目标拆解成小目标的。

我学习区块链的目标是很明确的,从大的方面来说,目标就两个:

  1. 为了以后进入区块链开发做技术储备;
  2. 为了能指导我如何投资虚拟货币。

那么,为了达到这两个目标,需要学习哪些东西呢?第一个目标需要学习区块链相关技术,就要知道区块链涉及到哪些技术、区块链的技术原理等;第二个目标则需要学习虚拟货币相关知识,包括需要解决有哪些虚拟货币、去哪里买虚拟货币、怎么买等问题,以及要了解虚拟货币未来的前景会如何。梳理之后,就变成了一个个待解决的问题:

  • 什么是区块链?
  • 区块链1.0、区块链2.0、区块链3.0有什么区别?
  • 什么是公有链?联盟链?私有链?
  • 什么是分叉?硬分叉和软分叉有什么区别?
  • 什么是隔离见证?闪电网络?雷电网络?
  • 比特币的本质是什么?
  • 比特币是如何交易的?
  • 什么是比特币钱包?
  • 什么是智能合约?
  • 什么是以太坊?以太坊和以太坊经典有什么区别?
  • 什么是币圈?什么是链圈?
  • 哪里买虚拟货币?怎么买?
  • 区块链和各种虚拟货币的前景如何?

很多时候,第一次梳理出来的问题并不全面,甚至很少,但没关系,在学习的过程中,就会不断涌现出新的问题。比如,有哪些共识算法?什么是超级账本?什么是ICO?怎么玩?等等这些就是我在学习的过程中新发现的问题。

这一步,最重要的是确定好大目标,然后拆解成一个个待解决的小问题。

搜集知识资源

当你的目标已经明确细分到一个个小问题之后,就可以根据这些问题搜集各种知识资源了。知识资源主要有三类:书籍、官方文档网络上的零散文章。搜集知识资源时,首选应该是书籍,因为书籍相对比较系统化,可以解答我们大部分的疑问。官方文档主要是为了加深理解,毕竟,书籍一般不会细化到一些技术细节之类的。而有部分问题,从书籍和官方文档是找不到答案的,这时就只能到网络上搜索相关文章了。另外,有些英语比较差的人看官方文档太累太费时间,也可以搜索网络上的中文文章。但我还是推荐尽量看原文,不会失真。

那么,回到我的区块链学习上来,选书上,我主要还是在那些比较畅销的书里挑,而且会尽量多覆盖上面所提到的问题。在亚马逊看每本书的简介和目录,最后选了以下几本:

  • 《区块链:新经济蓝图及导读》:同事推荐的一本书,讲到了区块链1.0、2.0、3.0,可以帮助我从宏观上了解区块链。
  • 《区块链:技术驱动金融》:从技术层面逐步解释了比特币是如何运作的。
  • 《区块链技术指南》:更加深入技术底层的书,还讲到了智能合约和超级账本,可以作为上一本书的补充。
  • 《区块链革命:比特币底层技术如何改变货币、商业和世界》:全景式描述了区块链理论及应用,这是为了扩大视野的书。

官方文档主要就是各种白皮书了,我搜集的白皮书主要包括:

最后,有些问题无法直接在书籍或白皮书中得到解答,比如说,什么是币圈?什么是链圈?这问题在百度或Google搜索一下就能得到答案了。有些文章会比较长,比如这篇:《详解最近大热的闪电网络、雷电网络和CORDA》,一时难以看懂,没关系,先收集起来,可以等后面进行大量泛读时再看。推荐可以去巴比特找资源,大部分都能找到。

这一步,主要还是搜集各种知识资源。先挑选书籍,尽量覆盖面广一点。再搜集各种官方文档,这是第一手信息,详细而不失真,可以加深理解。最后才是搜集网络上零散的文章,有些比较简单的问题可能直接就得到答案了,有些难以理解的要留到后面进行大量泛读时再看。

大量泛读

知识资源都搜集好了,接下来就可以开始进行大量泛读了。不知道大家是否还记得,大量泛读的主要目的就是:梳理出核心概念、主要观点框架逻辑。虽然在学习之前,我们或多或少都已经了解到了部分概念和观点,但在学习的过程中,我们就会发现还有更多我们不了解的概念和观点,这些都需要一一去理解的。

我们之前讲功利学习法时,提到知识资源可以分为三类:娱乐性、知识性、心智性。学习不同类型的知识应该分配不同的时间资源,知识性的比较适合碎片化学习,心智性的则需要集中几个小时进行系统性学习。因此,对我们搜集的这些知识资源,首先要区分好哪些是知识性的,哪些是心智性的,然后使用不同的时间管理策略进行阅读学习。对于我学习的区块链来说,大部分知识资源是属于知识性的,包括几本书籍的大部分内容和网络上搜集来的零散文章,这些我一般会放在上下班路上,或中午休息时学习;而那些专业的白皮书和书籍中深入技术架构的部分则属于心智性的知识,我一般就会放在晚上或周末集中两三个小时进行高强度学习。

阅读的方式也是有讲究的,我不会将几本书按顺序看完一本又一本,这样的话学习效率太低了。我会从一个个的问题出发,为了解决问题而从各种知识资源中找答案。解决完一个问题后再解决下一个。另外,在解决问题的过程中,普遍会遇到新的问题,那就把新问题先记下来,然后继续解决当前的问题。如果一个问题花了很长时间都解决不了,那可能这个问题对现阶段的你还没能力理解,这时可以先放一放,等学完其他再回过头来继续研究。

比如,我在解决“什么是比特币?”这个问题时,我会将每本书对比特币的介绍先看一遍,了解其框架逻辑后,再研读比特币白皮书,理解其技术架构和一些技术细节,直到终于明白,比特币从技术角度来说,本质上就是一堆复杂算法所生成的特解。另外,学习的过程中,就遇到了很多新的问题,比如“什么是共识算法?什么是Merkle Tree?什么是挖矿?等等,于是我就先把这些问题记下来,继续解决“什么是区块链?”的问题。

以下是我在大量泛读之后对一些核心概念的总结性理解:

  1. 区块链:从狭义上来说,区块链就是一种分布式的数据库,数据结构上就是按时间顺序将数据区块相连的一条链表,链上的每个节点就是一个区块,区块一般通过二叉树(如Merkle Tree)将每笔交易数据打包在一起,形成一个汇总的哈希值,再加上时间戳,就是一个区块的唯一标识。从广义上来说,区块链是结合了分布式数据存储、点对点传输、共识机制、加密算法等多种技术的一种分布式基础架构模式。
  2. 比特币:有很多人对比特币的理解就是账上的那串数字,但严格意义上来说,比特币是一种点对点的电子现金系统,是一整套系统,这点从比特币白皮书的标题上就已经说明了。可以将比特币简单理解为就是区块链技术的第一个应用。不过,“比特币”的概念要比“区块链”早,“区块链”这个概念是比特币发展了一段时间之后,将比特币的底层技术抽象出来形成的。另外,从投资的角度来看,比特币总量恒定为2100万个,其价值可以和黄金相比,但目前的市值和黄金的市值差距,还有很大的上涨空间,所以从长期来看,我还是非常看好。
  3. 以太坊:区块链2.0的典型代表,主要实现了智能合约的功能,开发人员可以在以太坊平台建立和发布各种分布式应用,这些应用,其实就是合约,智能合约说白了其实就是当达到某条件时会自动执行的代码。很多ICO的代币就是在以太坊平台上创建的应用。
  4. 公有链/联盟链/私有链:公有链就是完全公开的区块链,像比特币、以太坊;联盟链则不是完成公开的,是指有若干个机构共同参与管理的区块链,每个机构都运行着一个或多个节点,其中的数据只允许系统内不同的机构进行读写和发送交易,并且共同来记录交易数据,该联盟链的每个参与方不用担心自己数据存在哪里,自己产生的数据都只有自己看到,只有通过对方授权的密钥才能看到其他参与者的数据,这样就解决数据隐私和安全性问题,同时能够实现去中心化;私有链则是完全私有的,一般适合跨国公司,目前这一块还没了解到有什么代表性应用。

这一步,最高效率的阅读应该是带着问题去找答案。而且,要分配好时间资源,阅读知识性的信息时可以多用碎片化时间,心智性的内容则需要集中时间研读,要尽量理解每个核心概念。

建立模型

上一步我们已经大致理解了每个核心概念,但这些还只是一个个点,这一步就要将点与点之间连成线,逐渐连成网。即是说要理清不同核心概念之间的关联关系,逐渐形成系统模型。如果有条件的话,大白板当然是最好的思考工具,没有的话,用Visio、OmniGraffle之类的画图工具也可以,或者用XMind之类的思维导图也可以,甚至只用纸和笔都行,最重要的是要画出来。

以下是我用OmniGraffle梳理的区块链的系统模型图,因为时间和篇幅所限,所以只是部分内容:

求教专家

如果还存在无法解决的问题,那就要求教专家了。不过,这里也有一些需要注意的地方。

首先,你的朋友圈里要尽量多加一些专注不同领域的专家。不过,现在大部分人应该都加了不少牛人了,所以这一点没什么需要特别讲的。但有一点我想特别讲一下:不要随便什么问题都去求教专家

一来,专家基本都是很忙的,没太多时间总去帮你解决问题,尤其是当你问一些对他没有价值的问题时更不想搭理你。比如,你去请教一个Android架构师,问的却是“Android怎么打开蓝牙?”这种随便百度一下就能得到答案的问题,就算是我,我可能就会丢回一句“请自己去百度”。既然你请教的是Android架构师,就应该问架构方面的问题,而且最好是有深度、值得讨论、能给他带来价值的问题,比如,你去问他“什么是MVP?”这种问题也不合适,合适的请教方式应该是:你先讲出自己对MVP的理解,自己是如何用来架构项目的,再逐步与对方讨论MVP的架构思想、实现方案等。

二来,专家是你很重要的资源,你需要经营好。有一句话说得好:那些能帮到你的人,不是你的人脉,只有那些你能帮到的人,才是你的人脉。对于专家,虽然平时在专业领域上你很难帮得到他,但在其他方面你可以尽量多去给他提供帮助。

回到学习上来,为什么我们要先完成上一步的建立起系统模型后才来求教专家?之前的文章也有说过,因为如果你没有基本的全局观,问不出好问题。另外,也可以利用好知乎平台,很多问题,知乎上都有很多牛人有很好的回答。你也可以上去发问,当然,前提是你要提出好问题,这样,才有牛人愿意回答。

理解复述

复述最主要就是能起到强化理解的作用,费曼技巧是很好的一种复述方法,写作也是一种不错的方式,而且最好是公开性的。公开的写作,一来,你要写给不了解的人看,会促使你进行更完善的思考;二来,你可以从外部得到反馈,来完善和升级你的认知。所以我是推荐每个人都写博客的。

对于我来说,我更多就是写成文章分享出来,另外,有时候也会在公司内部做技术分享。

这一步也许是最费脑力的事情,但也是最能提高你的学习力的关键一步。

总结

我学习的方式可以总结为以下几个步骤:

  1. 确定目标:先确定大目标,再拆解成一个个待解决的小问题;
  2. 搜集知识资源:知识资源主要有三类:书籍、官方文档和网络上的零散文章,尽量搜集全一点;
  3. 大量泛读:最高效率的阅读应该是带着问题去找答案,而且要分配好时间资源;
  4. 建立模型:将一个个核心概念的点连成线,逐渐形成网,建立起系统模型;
  5. 求教专家:需要注意,不要随便什么问题都去求教专家,而且要经营好专家人脉;
  6. 理解复述:提高学习力的最关键一步,除了费曼技巧,写作也是一种推荐的方式。

思考和实践

如果让你学一门新的编程语言,你又会怎么学习?

Data science Python notebooks

 

data-science-ipython-notebooks

Index

 

deep-learning

IPython Notebook(s) demonstrating deep learning functionality.

 

tensor-flow-tutorials

Additional TensorFlow tutorials:

Notebook Description
tsf-basics Learn basic operations in TensorFlow, a library for various kinds of perceptual and language understanding tasks from Google.
tsf-linear Implement linear regression in TensorFlow.
tsf-logistic Implement logistic regression in TensorFlow.
tsf-nn Implement nearest neighboars in TensorFlow.
tsf-alex Implement AlexNet in TensorFlow.
tsf-cnn Implement convolutional neural networks in TensorFlow.
tsf-mlp Implement multilayer perceptrons in TensorFlow.
tsf-rnn Implement recurrent neural networks in TensorFlow.
tsf-gpu Learn about basic multi-GPU computation in TensorFlow.
tsf-gviz Learn about graph visualization in TensorFlow.
tsf-lviz Learn about loss visualization in TensorFlow.

tensor-flow-exercises

Notebook Description
tsf-not-mnist Learn simple data curation by creating a pickle with formatted datasets for training, development and testing in TensorFlow.
tsf-fully-connected Progressively train deeper and more accurate models using logistic regression and neural networks in TensorFlow.
tsf-regularization Explore regularization techniques by training fully connected networks to classify notMNIST characters in TensorFlow.
tsf-convolutions Create convolutional neural networks in TensorFlow.
tsf-word2vec Train a skip-gram model over Text8 data in TensorFlow.
tsf-lstm Train a LSTM character model over Text8 data in TensorFlow.

 

theano-tutorials

Notebook Description
theano-intro Intro to Theano, which allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.
theano-scan Learn scans, a mechanism to perform loops in a Theano graph.
theano-logistic Implement logistic regression in Theano.
theano-rnn Implement recurrent neural networks in Theano.
theano-mlp Implement multilayer perceptrons in Theano.

 

keras-tutorials

Notebook Description
keras Keras is an open source neural network library written in Python. It is capable of running on top of either Tensorflow or Theano.
setup Learn about the tutorial goals and how to set up your Keras environment.
intro-deep-learning-ann Get an intro to deep learning with Keras and Artificial Neural Networks (ANN).
theano Learn about Theano by working with weights matrices and gradients.
keras-otto Learn about Keras by looking at the Kaggle Otto challenge.
ann-mnist Review a simple implementation of ANN for MNIST using Keras.
conv-nets Learn about Convolutional Neural Networks (CNNs) with Keras.
conv-net-1 Recognize handwritten digits from MNIST using Keras – Part 1.
conv-net-2 Recognize handwritten digits from MNIST using Keras – Part 2.
keras-models Use pre-trained models such as VGG16, VGG19, ResNet50, and Inception v3 with Keras.
auto-encoders Learn about Autoencoders with Keras.
rnn-lstm Learn about Recurrent Neural Networks (RNNs) with Keras.
lstm-sentence-gen Learn about RNNs using Long Short Term Memory (LSTM) networks with Keras.

deep-learning-misc

Notebook Description
deep-dream Caffe-based computer vision program which uses a convolutional neural network to find and enhance patterns in images.

 

scikit-learn

IPython Notebook(s) demonstrating scikit-learn functionality.

Notebook Description
intro Intro notebook to scikit-learn. Scikit-learn adds Python support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.
knn Implement k-nearest neighbors in scikit-learn.
linear-reg Implement linear regression in scikit-learn.
svm Implement support vector machine classifiers with and without kernels in scikit-learn.
random-forest Implement random forest classifiers and regressors in scikit-learn.
k-means Implement k-means clustering in scikit-learn.
pca Implement principal component analysis in scikit-learn.
gmm Implement Gaussian mixture models in scikit-learn.
validation Implement validation and model selection in scikit-learn.

 

statistical-inference-scipy

IPython Notebook(s) demonstrating statistical inference with SciPy functionality.

Notebook Description
scipy SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.
effect-size Explore statistics that quantify effect size by analyzing the difference in height between men and women. Uses data from the Behavioral Risk Factor Surveillance System (BRFSS) to estimate the mean and standard deviation of height for adult women and men in the United States.
sampling Explore random sampling by analyzing the average weight of men and women in the United States using BRFSS data.
hypothesis Explore hypothesis testing by analyzing the difference of first-born babies compared with others.

 

pandas

IPython Notebook(s) demonstrating pandas functionality.

Notebook Description
pandas Software library written for data manipulation and analysis in Python. Offers data structures and operations for manipulating numerical tables and time series.
github-data-wrangling Learn how to load, clean, merge, and feature engineer by analyzing GitHub data from the Viz repo.
Introduction-to-Pandas Introduction to Pandas.
Introducing-Pandas-Objects Learn about Pandas objects.
Data Indexing and Selection Learn about data indexing and selection in Pandas.
Operations-in-Pandas Learn about operating on data in Pandas.
Missing-Values Learn about handling missing data in Pandas.
Hierarchical-Indexing Learn about hierarchical indexing in Pandas.
Concat-And-Append Learn about combining datasets: concat and append in Pandas.
Merge-and-Join Learn about combining datasets: merge and join in Pandas.
Aggregation-and-Grouping Learn about aggregation and grouping in Pandas.
Pivot-Tables Learn about pivot tables in Pandas.
Working-With-Strings Learn about vectorized string operations in Pandas.
Working-with-Time-Series Learn about working with time series in pandas.
Performance-Eval-and-Query Learn about high-performance Pandas: eval() and query() in Pandas.

 

matplotlib

IPython Notebook(s) demonstrating matplotlib functionality.

Notebook Description
matplotlib Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
matplotlib-applied Apply matplotlib visualizations to Kaggle competitions for exploratory data analysis. Learn how to create bar plots, histograms, subplot2grid, normalized plots, scatter plots, subplots, and kernel density estimation plots.
Introduction-To-Matplotlib Introduction to Matplotlib.
Simple-Line-Plots Learn about simple line plots in Matplotlib.
Simple-Scatter-Plots Learn about simple scatter plots in Matplotlib.
Errorbars.ipynb Learn about visualizing errors in Matplotlib.
Density-and-Contour-Plots Learn about density and contour plots in Matplotlib.
Histograms-and-Binnings Learn about histograms, binnings, and density in Matplotlib.
Customizing-Legends Learn about customizing plot legends in Matplotlib.
Customizing-Colorbars Learn about customizing colorbars in Matplotlib.
Multiple-Subplots Learn about multiple subplots in Matplotlib.
Text-and-Annotation Learn about text and annotation in Matplotlib.
Customizing-Ticks Learn about customizing ticks in Matplotlib.
Settings-and-Stylesheets Learn about customizing Matplotlib: configurations and stylesheets.
Three-Dimensional-Plotting Learn about three-dimensional plotting in Matplotlib.
Geographic-Data-With-Basemap Learn about geographic data with basemap in Matplotlib.
Visualization-With-Seaborn Learn about visualization with Seaborn.

 

numpy

IPython Notebook(s) demonstrating NumPy functionality.

Notebook Description
numpy Adds Python support for large, multi-dimensional arrays and matrices, along with a large library of high-level mathematical functions to operate on these arrays.
Introduction-to-NumPy Introduction to NumPy.
Understanding-Data-Types Learn about data types in Python.
The-Basics-Of-NumPy-Arrays Learn about the basics of NumPy arrays.
Computation-on-arrays-ufuncs Learn about computations on NumPy arrays: universal functions.
Computation-on-arrays-aggregates Learn about aggregations: min, max, and everything in between in NumPy.
Computation-on-arrays-broadcasting Learn about computation on arrays: broadcasting in NumPy.
Boolean-Arrays-and-Masks Learn about comparisons, masks, and boolean logic in NumPy.
Fancy-Indexing Learn about fancy indexing in NumPy.
Sorting Learn about sorting arrays in NumPy.
Structured-Data-NumPy Learn about structured data: NumPy’s structured arrays.

 

python-data

IPython Notebook(s) demonstrating Python functionality geared towards data analysis.

Notebook Description
data structures Learn Python basics with tuples, lists, dicts, sets.
data structure utilities Learn Python operations such as slice, range, xrange, bisect, sort, sorted, reversed, enumerate, zip, list comprehensions.
functions Learn about more advanced Python features: Functions as objects, lambda functions, closures, *args, **kwargs currying, generators, generator expressions, itertools.
datetime Learn how to work with Python dates and times: datetime, strftime, strptime, timedelta.
logging Learn about Python logging with RotatingFileHandler and TimedRotatingFileHandler.
pdb Learn how to debug in Python with the interactive source code debugger.
unit tests Learn how to test in Python with Nose unit tests.

 

kaggle-and-business-analyses

IPython Notebook(s) used in kaggle competitions and business analyses.

Notebook Description
titanic Predict survival on the Titanic. Learn data cleaning, exploratory data analysis, and machine learning.
churn-analysis Predict customer churn. Exercise logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Includes discussions of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.

 

spark

IPython Notebook(s) demonstrating spark and HDFS functionality.

Notebook Description
spark In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms.
hdfs Reliably stores very large files across machines in a large cluster.

 

mapreduce-python

IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality.

Notebook Description
mapreduce-python Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and mrjob config file to analyze Amazon S3 bucket logs on Elastic MapReduce. Disco is another python-based alternative.

 

aws

IPython Notebook(s) demonstrating Amazon Web Services (AWS) and AWS tools functionality.

Also check out:

  • SAWS: A Supercharged AWS command line interface (CLI).
  • Awesome AWS: A curated list of libraries, open source repos, guides, blogs, and other resources.
Notebook Description
boto Official AWS SDK for Python.
s3cmd Interacts with S3 through the command line.
s3distcp Combines smaller files and aggregates them together by taking in a pattern and target file. S3DistCp can also be used to transfer large volumes of data from S3 to your Hadoop cluster.
s3-parallel-put Uploads multiple files to S3 in parallel.
redshift Acts as a fast data warehouse built on top of technology from massive parallel processing (MPP).
kinesis Streams data in real time with the ability to process thousands of data streams per second.
lambda Runs code in response to events, automatically managing compute resources.

 

commands

IPython Notebook(s) demonstrating various command lines for Linux, Git, etc.

Notebook Description
linux Unix-like and mostly POSIX-compliant computer operating system. Disk usage, splitting files, grep, sed, curl, viewing running processes, terminal syntax highlighting, and Vim.
anaconda Distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment.
ipython notebook Web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.
git Distributed revision control system with an emphasis on speed, data integrity, and support for distributed, non-linear workflows.
ruby Used to interact with the AWS command line and for Jekyll, a blog framework that can be hosted on GitHub Pages.
jekyll Simple, blog-aware, static site generator for personal, project, or organization sites. Renders Markdown or Textile and Liquid templates, and produces a complete, static website ready to be served by Apache HTTP Server, Nginx or another web server.
pelican Python-based alternative to Jekyll.
django High-level Python Web framework that encourages rapid development and clean, pragmatic design. It can be useful to share reports/analyses and for blogging. Lighter-weight alternatives include Pyramid, Flask, Tornado, and Bottle.

misc

IPython Notebook(s) demonstrating miscellaneous functionality.

Notebook Description
regex Regular expression cheat sheet useful in data wrangling.
algorithmia Algorithmia is a marketplace for algorithms. This notebook showcases 4 different algorithms: Face Detection, Content Summarizer, Latent Dirichlet Allocation and Optical Character Recognition.

notebook-installation

anaconda

Anaconda is a free distribution of the Python programming language for large-scale data processing, predictive analytics, and scientific computing that aims to simplify package management and deployment.

Follow instructions to install Anaconda or the more lightweight miniconda.

dev-setup

For detailed instructions, scripts, and tools to set up your development environment for data analysis, check out the dev-setup repo.

running-notebooks

To view interactive content or to modify elements within the IPython notebooks, you must first clone or download the repository then run the notebook. More information on IPython Notebooks can be found here.

$ git clone https://github.com/donnemartin/data-science-ipython-notebooks.git
$ cd data-science-ipython-notebooks
$ jupyter notebook

Notebooks tested with Python 2.7.x.

credits

contributing

Contributions are welcome! For bug reports or requests please submit an issue.

contact-info

Feel free to contact me to discuss any issues, questions, or comments.

license

This repository contains a variety of content; some developed by Donne Martin, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Donne Martin is distributed under the following license:

I am providing code and resources in this repository to you under an open source license. Because this is my personal repository, the license you receive to my code and resources is from me and not my employer (Facebook).

Copyright 2015 Donne Martin

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License

使用Keras卷积神经网络

这篇文章记录如何用 Keras 实现 卷积神经网络 CNN,并训练模型用于图片分类;以及 CNN 中一些超参的调整和自己的理解。

数据集

http://www.ivl.disco.unimib.it/activities/large-age-gap-face-verification/

这个图片数据集是一些名人的少年时和成年后的对比照片,格式为 100*100,RGB。

将图片分成成年和少年两个类别,实现的分类器是个二分器,训练出来的模型能够对输入的照片进行分类,给出一个 old 或者 young 的 label。

整理数据集为如下目录结构:

train  
├── old
│   ├── 1200 张图片
├── young
│   ├── 1200 张图片

validation  
├── old
│   ├── 600 张图片
├── young
│   ├── 600 张图片

test  
├── old
│   ├── 110 张图片
├── young
│   ├── 110 张图片

train 为训练组,validation 为验证组,训练时使用交叉验证,train 和 validation 的数据都会使用。

test 为验证模型的测试组。

图片展示

以下是 train/old 组展示的部分图片:

%matplotlib inline

import os  
import matplotlib.pyplot as plt  
import matplotlib.image as mpimg

some_old_pics = os.listdir('train/old')  
show_rows = show_columns = 5  
for k in range(show_rows * show_rows):  
    image = mpimg.imread('train/old/%s' % some_old_pics[k])
    plt.subplot(show_rows, show_rows, k+1)
    plt.imshow(image)
    plt.axis('off')

“平均脸”

其实就是求每组图片的平均值,代码参考这里

import os, numpy, PIL  
from PIL import Image  
import matplotlib.pyplot as plt  
def plot_average_pic(group):  
    dirs = ["%s/%s" % (group, i) for i in ['old', 'young']]
    for k in range(len(dirs)):
        dir = dirs[k]
        imlist = ['%s/%s' % (dir, c) for c in os.listdir(dir)]
        w, h = Image.open(imlist[0]).size
        N = len(imlist)
        arr = numpy.zeros((h, w, 3), numpy.float)
        for im in imlist:
            imarr = numpy.array(Image.open(im), dtype=numpy.float)
            arr = arr + imarr / N
        arr = numpy.array(numpy.round(arr), dtype=numpy.uint8)
        image = Image.fromarray(arr, mode="RGB")
        plt.subplot(2, 3, k+1)
        plt.imshow(image)
        plt.title(dir)
        plt.axis('off')

结果如下:

好神奇 …

从“平均脸”这个结果来看,两组 label 确实有比较明显的区分。

创建 CNN 模型

导入的模块:

from keras.preprocessing.image import ImageDataGenerator  
from keras.models import Sequential  
from keras.layers import Conv2D, MaxPooling2D  
from keras.layers import Activation, Dropout, Flatten, Dense  

先定义好一些参数,所有图片的输入尺寸 (100*100,RGB 三通道),train / validation 样本数,训练轮次 epochs,以及小批量梯度下降训练样本值 batch_size:

img_width, img_height = 100, 100

train_data_dir = './train'  
validation_data_dir = './validation'  
nb_train_samples = 1200  
nb_validation_samples = 600  
epochs = 100  
batch_size = 16

input_shape = (img_width, img_height, 3)  

创建 CNN

model = Sequential()  
model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (3, 3), padding='same'))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3, 3), padding='same'))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())  
model.add(Dense(64))  
model.add(Activation('relu'))  
model.add(Dropout(0.5))

model.add(Dense(1))  
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',  
              optimizer='rmsprop',
              metrics=['accuracy'])

定义 train 和 validation 的 ImageDataGenerator,图像增强,用缩放、镜像、旋转等方式增加图片,以便扩大数据量:

train_datagen = ImageDataGenerator(  
    rescale=1. / 255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)
validation_datagen = ImageDataGenerator(  
    rescale=1. / 255)
train_generator = train_datagen.flow_from_directory(  
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')
validation_generator = validation_datagen.flow_from_directory(  
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='binary')

开始训练:

from keras.callbacks import ModelCheckpoint, Callback, TensorBoard  
filepath="1.weights-improvement-{epoch:02d}-{val_acc:.2f}.h5"  
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')  
tensorboard = TensorBoard(log_dir='logs', histogram_freq=0)  
callbacks_list = [checkpoint, tensorboard]

model.fit_generator(  
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=nb_validation_samples // batch_size,
    callbacks=callbacks_list)

可以看到前几次训练之后验证组的准确率 ( val_acc ) 就可以达到 82% 左右了。

( 一个 epoch 用 i7 CPU 需要 180s,而 GTX1080 GPU 只需要 3s )

100 轮训练后成绩最好的结果:

loss: 0.2957 - acc: 0.8875 - val_loss: 0.2994 - val_acc: 0.9139  

val_acc 比 acc 还要高 …

Tenserboard 上看到的训练过程:

验证模型:

导入效果最好的模型

from keras.models import load_model  
best_model = load_model('a.weights-improvement-64-0.91.h5')  

还是用 ImageDataGenerator:

datagen = ImageDataGenerator()  
test_generator = datagen.flow_from_directory(  
        './test',
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode='binary')

验证:

scores = best_model.evaluate_generator(  
        test_generator,
        val_samples=220)
print scores  

结果:

[2.7107817684386277, 0.82976878729858838]

准确率达到了 82.98%,效果还不错,但是距离训练时的 91.39% 还有一定的差距,比较明显的过拟合。

dropout

将模型中倒数第三层的 Dropout 参数,降低为 0.2,增加模型的泛化能力:

model.add(Dropout(0.2))  

训练及测试结果:

[train] loss: 0.2599 - acc: 0.9008 - val_loss: 0.2315 - val_acc: 0.9105
[test]  loss: 2.1801 - acc: 0.8636 

准确率提升至 86.36% 。

optimizer

将 optimizer 从 rmsprop 更改为当前比较流行的 adam:

model.compile(loss='binary_crossentropy',  
              optimizer='adam',
              metrics=['accuracy'])

结果:

[train] loss: 0.1916 - acc: 0.9250 - val_loss: 0.3205 - val_acc: 0.9037
[test]  loss: 1.4704 - acc: 0.8962

准确率进一步提升为 89.62% 。

从这次模型来看,测试结果与训练结果相当接近。

kernel size

将卷积层的卷积核大小从 (3,3) 改为 (5,5)

Conv2D(32, (5, 5), padding='same', input_shape=input_shape)  

可以理解为增加了神经网络的权重参数数量,因为卷积层的权重参数数量,以第一层卷积层为例:

3 * (3 * 3) * 32 + 32 = 896  

增大为:

3 * (5 * 5) * 32 + 32 = 2432  

( 乘式前面的 3 为卷积层的深度,第一层是输入层的深度;后面的 32 为过滤器的个数,加上的 32 为 bias 个数,每个过滤器 1 个 bias )

结果:

[train] loss: 0.1104 - acc: 0.9533 - val_loss: 0.3632 - val_acc: 0.9003
[test]  loss: 1.7149 - acc: 0.8902

可以看到训练时的训练组准确率达到了 95.33% 的较高水平,反映了权重参数数量增多的正面影响。

然而测试结果来看,与未改变卷积核大小时反而有些微下降,或许还应该降低 Dropout 的比例。

在此基础上将 Dropout 比例由 0.2 再次下降为 0.1,结果如下:

[train] loss: 0.1321 - acc: 0.9500 - val_loss: 0.3917 - val_acc: 0.8970
[test]  loss: 1.4034 - acc: 0.9124

测试的准确率超过了 91% !

padding

将卷积层中的 padding 参数改为默认的 valid,即:

Conv2D(32, (5, 5))  

保持 0.2 的 Dropout 比例

结果:

[train] loss: 0.1768 - acc: 0.9208 - val_loss: 0.3337 - val_acc: 0.8986
[test]  loss: 1.7612 - acc: 0.8902

从结果来看并没多大的区别。

activation

测试更换激活函数。比如使用 PReLU,Keras 里使用 PReLU 需要使用 advanced_activations 类,模型修改如下:

from keras.layers.advanced_activations import PReLU

model = Sequential()  
model.add(Conv2D(32, (5, 5), input_shape=input_shape, activation='linear'))  
model.add(PReLU())  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (5, 5), activation='linear'))  
model.add(PReLU())  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (5, 5), activation='linear'))  
model.add(PReLU())  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())  
model.add(Dense(64))  
model.add(Activation('relu'))  
model.add(Dropout(0.1))

model.add(Dense(1))  
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',  
              optimizer='adam',
              metrics=['accuracy'])

需要先声明一个 linear 激活函数的卷积层,然后再在后面增加一个 PReLU 层。

因为 PReLU 实际上会增加权重参数数量,因此使用了 0.1 的 Droupout。

结果:

[train] loss: 0.2656 - acc: 0.8850 - val_loss: 0.2623 - val_acc: 0.9054
[test]  loss: 1.9551 - acc: 0.8737

试下 LeakyReLU(alpha=0.001)

[train] loss: 0.2002 - acc: 0.9267 - val_loss: 0.2987 - val_acc: 0.8885

然而在加载最佳结果 load_model 时有报错,没有执行最佳模型的测试。在最后一组训练得出的 val_acc: 0.8497 结果的模型上测试结果如下:

[test]  loss: 2.1069 - acc: 0.8688

batch size

训练时的 batch size 对训练模型也有一定的影响,具体可参考知乎上的讨论

实际测试如下:( 在之前 91% 测试准确率模型基础上修改 batch_size )

单从这个测试结果上看,还是 batch size 为 16 的最佳。

对比 batch size = 16 的训练过程,batch size = 8 的 acc 开始时上升较快,val_acc 震荡更为明显:

deeper

更深的网络。

再增加一层卷积层,模型修改如下:

model = Sequential()  
model.add(Conv2D(32, (5, 5), padding='same', input_shape=input_shape))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(32, (5, 5), padding='same'))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (5, 5), padding='same'))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128, (5, 5), padding='same'))  
model.add(Activation('relu'))  
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())  
model.add(Dense(128))  
model.add(Activation('relu'))  
model.add(Dropout(0.1))

model.add(Dense(1))  
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',  
              optimizer='adam',
              metrics=['accuracy'])

结果:

[train] loss: 0.2415 - acc: 0.9075 - val_loss: 0.3782 - val_acc: 0.8581
[test]  loss: 2.5879 - acc: 0.8361

可以看到过拟合严重了,或许对于数据量较少的训练集,使用更深的网络并不是一个较好的选择,比较容易出现过拟合。

最后

使用了测试结果为 91% 的模型,随便在谷歌上找了些图片进行测试,效果还不赖,如下图:

老年组 90% 准确率

温老四你怎么了 …

少年组 95% 准确率

詹老汉亮了。

参考

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

from:https://zhengheng.me/2017/08/30/keras-cnn/