Friday, 14 October 2016

Java 8 Streams

Definition:

Stream is a sequence of elements from a source that supports data processing operations.

Stream Operations:

This consists of two types of operations:
  1. Intermediate Operation : These return back a stream and this helps in forming a query. The intermediate operations don't perform any processing until an terminal operation is invoked, thus they are lazy.ex: filter
  2. Terminal Operation : These produce a result from a stream pipeline. They return a non stream value. ex:forEach.

1.Intermediate Operations:

i. filter : This operation takes a Predicate function and returns a stream including all the elements that match the predicate.

ii. distinct : This gives the unique elements in the stream.

iii. limit : This limits the number of rows returned.

iv. skip : This skips the number of rows in the beginning.

v. map : This transforms the stream.

vi. flatMap : This maps each array not with the stream but with the contents of that stream. It is important understand the below Arrays.stream.

    Arrays.stream :
          String[][] strArr = new String[][]{{"P" , "Q"},{ "R" ,"S" }};
          Stream<String[]> strArrStream = Arrays.stream (strArr);

     This can be similar to when we convert array to List.
          List<String[]> lst = Arrays.asList(strArr1);

     Another example of this is:

      Integer[][][] threeDArray = {  { {1,   2,  3}, { 4,  5,  6}, { 7,  8,  9} },
                     { {10, 11, 12}, {13, 14, 15}, {16, 17, 18} },
                     { {19, 20, 21}, {22, 23, 24}, {25, 26, 27} } };

      Stream<Integer[][]> intArr = Arrays.stream(threeDArray);

     Note:int [] numbers = {2,3,5,7,11,13};
              Arrays. stream( numbers); //This returns an IntStream.

     For the above examples the flatMap is used as:

Arrays. asList( strArr1).stream(). flatMap(Arrays:: stream).forEach(System. out::println);
Arrays.asList(threeDArray).stream().flatMap(Arrays::stream).flatMap(Arrays::stream).forEach(System.out:: println);

2.Terminal Operations:

vii. anyMatch : If there is any matching predicate then this returns true.

viii. allMatch : If all the elements matches then this returns true.

ix. noneMatch : If none of them matches, then this return true.

x. findFirst : This returns the first element. The return type of this is Optional.

xi. findAny : It would return any of the element. The return type of this is Optional.

     When to use findFirst and findAny?

     Finding the first element is more constraining in parallel. If you don't care about which element is returned then use findAny.

xii. Reducing: This returns a single value.

For example : To find the sum of the numbers.

List<Integer> numbers = Arrays.asList(1,2,3,4,5);
int sum = numbers.stream().reduce(0,(a,b) -> a+b);

This takes in two arguments:
i. An initial value, here is zero.
ii.An BinaryOperator<T> to combine two elements and produce  a new one.

We also have an overloaded version of this and it returns an Optional value.
Optional<Integer> sum = numbers.stream().reduce((a,b) -> a+b);

For finding the maximum, sum and minimum values we can use max,sum and min methods respectively.

Stream Operations : stateless vs stateful

Operations like map and filter take each element from the input stream and produce zero or one result in the output and hence these are stateless. But Operations like reduce,sum and max need to have an internal state to accumulate the result and hence these are stateful operations. The internal state of this is bounded no matter how many elements are there and hence these have an bounded size but whereas operations like sorted and distinct are unbounded.

Numeric Streams:

int calories = menu.stream().map(Dish::getCalories).reduce(0,Integer::sum);

The problem with this code is that theres an boxing cost. Behind the scenes each integer needs to be unboxed to a primitive type before performing the summation and hence we have primitive stream specializations.

IntStream intStream= menu.stream().mapToInt(Dish::getCalories);

Converting back to Stream of Objects:

Stream<Integer> stream = intStream.boxed();

Default Optional Value for a primitive Stream.

OptionalInt maxCalories = menu.stream().mapToInt(Dish::getCalories).max();

Providing a default value if there is no maximum.

int max = maxCalories.orElse(1);

Numeric ranges:

IntStream evenNumbers = IntStream.rangeclosed(1,100).filter(n -> n % 2 == 0);

Building Streams:

i. Streams from values:

Stream<String> stream = Stream.of("Java 8","Stream","Lambdas");

ii.Streams from arrays:

int [] numbers = {2,3,5,7,11,13};
IntStream intStream =  Arrays. stream( numbers);

Creating infinite stream:

Stream.iterate and Stream.generate  will let you create an infinite stream. It is sensible to use limit with this.The example for this is below.

Stream.iterate is an stateful because the output of it depends on the input provided. This uses unaryOperator function.

Stream.iterate(0,n->n+2).limit(10).forEach(System.out::println);

This can be useful if you want to produce dates in a sequential order.

Stream.generate

This takes supplier as a function and is a stateless method.

Stream.generate(Math::random).limit(5).forEach(System.out::println).

Collectors:

The three important functionalities of the predefined collectors are:

1.Reducing and summarizing the stream to single value.
2.Grouping elements.
3.Partitioning elements.


1.Reducing and Summarizing.

i. counting: For getting the count.
ii.maxBy: This takes in a comparator and gives the maximum value.
iii.summingInt : Get the sum.
iv.averagingInt : Find the average.
v.summarizingInt : This gives all the stats.
vi.joining : This can be used to join all the values.
vii.reduction :

int totalCalories = menu.stream().collect(reducing(0,Dish::getCalories,(i,j) -> i + j));

Collect Vs Reduce:

Reduce is immutable but whereas collect is an mutating operation.

Doing the same Operation in different ways:

Finding the sum of the calories.

public class MyTest {

        public static void main(String[] args) {
              List<Dish> menu = Arrays. asListnew Dish( "pork" , false , 800, Dish.Type.MEAT ),
                new Dish("beef" , false, 700, Dish.Type. MEAT),
                new Dish("chicken" , false, 400, Dish.Type. MEAT),
                new Dish("french fries" , true, 530, Dish.Type. OTHER),
                new Dish("rice" , true, 350, Dish.Type. OTHER),
                new Dish("season fruit" , true, 120, Dish.Type. OTHER),
                new Dish("pizza" , true, 550, Dish.Type. OTHER),
                new Dish("prawns" , false, 400, Dish.Type. FISH),
                new Dish("salmon" , false, 450, Dish.Type. FISH));

               //Method1: Using reducing : This method takes a BinaryOperator Function whose input and output type should be same
               //Cons: Here we are boxing the primitive int value and hence this reduces the performance
              Integer sum1 = menu.stream().map(Dish::getCalories).reduce(( d1, d2) -> d1 +d2 ).get();
              System. out .println("Sum1 : " +sum1 );

               Integer sum1over = menu.stream().reduce(0,( x, y)-> x +y ,Integer::sum);
              System. out .println(sum1over );

               //Method2: Using the IntStream
               //This is the best approach to take.
              Integer sum2 = menu.stream().mapToInt(Dish::getCalories).sum();
              System. out .println("Sum2 : " +sum2 );

               //Method3: Using the Collect Method
              Integer sum3 = menu.stream().collect( summingInt(Dish::getCalories));
              System. out .println("Sum3 : " +sum3 );

               //Method4: Using Collectors reducing
              Integer sum4 = menu .stream().map(Dish::getCalories).collect(reducing(Integer:: sum)).get();
              System. out .println("Sum4 : " +sum4 );

              Integer sum4over = menu .stream().collect(reducing(0,Dish::getCalories,( i, j) -> i + j));
              System. out .println(sum4over );
      }
}

2.Grouping

In this the keys are grouped together using the groupingBy.

              Map<Dish.Type,List<Dish>> groupOfMap = menu .stream().collect(groupingBy (Dish::getType));

In this I can further do multilevel grouping and many other additional features.

3.Partitioning

The partitioning is done using the partitionBy keyword.

menu.stream().collect(partitioningBy(Dish::isVegetarian)).entrySet().forEach(entry -> System.out.println(entry.getKey()+ " "+entry.getValue()));

Implementing a user defined Collector:

Below are methods defined in the Collector interface.

public interface Collector<T,A,R> {
     Supplier<A> supplier();
     BiConsumer<A,T> accumulator();
     Function<A,R> finisher();
     BinaryOperator<A> combiner();
     Set<Characteristics> characteristics();
}

In this: T -> Generic Type of items to be collected.
           A -> Its a type of object which the partial result will be accumulated during the collection process.
           R -> Result type from the Collector Operator.

1.Supplier : It returns an Supplier of an empty result.
2.Accumulator : It returns a result which performs the reduction operation.
3.Finisher : This returns a function that is invoked at the end of the accumulator.
4.Combiner : This returns the function on how the accumulators are combined.
5.Characteristics : UNORDERED,CONCURRENT and IDENTITY_FINISH are the ones which can be set.

Parallel data processing:

This helps to process the collection in a parallel way. Below is the command which shows parallel processing.

Stream.iterate(1L,i -> i+1).limit(n).parallel().reduce(0L,Long::sum);

In this we are tuning a sequential stream into a parallel one. The stream would start processing in  parallel once it finds the parallel one. Also, if they are sequential and parallel stream then the last call decides if it needs to processed in a parallel or sequential way. The parallel stream splits the data using the Spliterator.

Spliterator:

The Spliterator is an interface  which is used to traverse the elements of the source in a parallel way.

public interface Spliterator<T>{
     boolean tryAdvance(Consumer<? super T> action);
     Spliterator<T> trySplit();
     long estimateSize();
     int characteristics;
}

The trySplit method is the one in which the splitting logic is defined and this returns an Spliterator. We can use the above methods and implement our own spliterator.

No comments:

Post a Comment