By Thomas Anowez


2018-09-14 15:06:00 8 Comments

If I've a perl array with following structure (Date, Time, User), sorted by user:

open my $fh, '<', $file;
while( <$fh> ) {
  my @lines = split /\n/;
  my ($user, $y, $m, $d, $time) = $lines[0] =~ /\A(\w);(\d+)\/(\d+)\/(\d+);(\d+:\d+:\d+.\d+)/;   # Encapsulate values
  push @evts, { user => $user, date => "$y/$m/$d", time => $time};  # Array loader
} # This was missing.
close($fh);
my @by_usr = sort { $a->{user} cmp $b->{user} } @evts;

How could I remove duplicate entry from it, if it has the exactly same time?

$VAR1 = {
          'time' => '08:08:36.120',
          'date' => '2018/08/06',
          'user' => 'USER1'
        };
$VAR2 = {
          'time' => '08:08:36.120',
          'date' => '2018/08/06',
          'user' => 'USER1'
        };
...(and more)

I've try with unique function but it doesn't work:

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}
my @unique_events = uniq (@by_usr);

I'm available for any clarifications.

3 comments

@simbabque 2018-09-14 15:19:41

This answer is for eliminating complete duplicate hash references!

If you only want the time key to be unique, see ysth's answer.

Your implementation of uniq only works if those references are pointing to the same memory. But likely they are not, they just contain the same value.

You need to look at the values inside each element and build your uniq that way. The easiest way is to simply concatenate all values in a known way. You can add a delimiter between the fields, like you would in a CSV export. You could also hash this with a digest algorithm (like MD5, which is in general discouraged but shouldn't have a high chance of collision here).

sub uniq {
  my %seen;
  grep {
    my $e = $_;
    my $key = join '___', map { $e->{$_}; } sort keys %$_;
    !$seen{$key}++
  } @_;
}

I picked ___ as a delimiter because that's unlikely to appear in your data. Since it takes the keys and sorts it, this can be used universally.

Also note that you can eliminate the duplicates before you sort by user. If you intend to sort by other columns as well, this will save you additional work. Depending on the number of lines of your input data, reducing will the size will in general be faster than sorting first.

@Thomas Anowez 2018-09-14 15:30:21

I confrm that it works

@ysth 2018-09-14 15:33:41

OP only wanted to unique time, not all fields

@simbabque 2018-09-14 15:43:40

@ysth oops, then yours is correct of course

@ysth 2018-09-14 15:44:54

Yours may be what they actually want, despite the question :)

@ikegami 2018-09-15 07:30:41

Re "This answer is for eliminating complete duplicate hash references!", ....assuming all the hashes have the same keys. The following avoids that problem, and avoids the problems with using sentinel values: my $key = pack '(J/a*)*', map { $_ => $e->{$_} } sort keys %$e; Could also use state $json = Cpanel::JSON::XS->new->canonical; my $key = $json->encode($_);

@ysth 2018-09-14 15:38:34

You are only checking if the hash references (when stringified) are unique. To check for unique times, just, well, do that.

grep !$seen{$_->{'time'}}++, @_;

simbabque's answer checks if any value is different, not just time (assuming all the hashes have the same keys and no values contain ___).

@Borodin 2018-09-14 15:35:58

  • You have made this far harder than necessary by using a regex to parse your data: there is no need for anything more than a split /;/

  • Please indent your code properly. You will find it much easier to work with, and it is only polite when you are asking others for help with it. As a result, your code won't even compile and I have had to fix it before working on the problem at hand

  • You should also use strict and use warnings 'all' at the top of every Perl program you write, and declare every variable as close as possible to its first point of use with my

  • You should always check that an open call has succeeded, and invoke die on any error with the value of $! in the die string to say why it failed. It doesn't make sense to continue running most programs if the source of input data is unavailable

To make a list of unique entries, you can use the uniq_by function from the List::UtilsBy module. This isn't a core module and is likely to need installing

Here's how I would write your code

use strict;
use warnings 'all';

use List::UtilsBy 'uniq_by';

my $file = 'evts.txt';

my @evts;

{
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};

    while ( <$fh> ) {
        chomp;

        my ( $user, $date, $time ) = split /;/;

        push @evts, {
            user => $user,
            date => $date,
            time => $time,
        };
    }
}

my @uniq = uniq_by { $_->{time} } @evts;

use Data::Dump;
dd \@uniq;

evts.txt

USER1;2018/08/06;08:08:36.120
USER1;2018/08/06;08:08:36.120

output

[
  { date => "2018/08/06", time => "08:08:36.120", user => "USER1" },
]

Related Questions

Sponsored Content

30 Answered Questions

[SOLVED] How do you remove duplicates from a list whilst preserving order?

54 Answered Questions

[SOLVED] Remove duplicate values from JS array

37 Answered Questions

[SOLVED] How can I remove duplicate rows?

25 Answered Questions

[SOLVED] Finding duplicate values in a SQL table

  • 2010-04-07 18:17:29
  • Alex
  • 2095277 View
  • 1407 Score
  • 25 Answer
  • Tags:   sql duplicates

43 Answered Questions

[SOLVED] Removing duplicates in lists

39 Answered Questions

[SOLVED] Sort array of objects by string property value

9 Answered Questions

[SOLVED] Sort Multi-dimensional Array by Value

36 Answered Questions

[SOLVED] How to pair socks from a pile efficiently?

25 Answered Questions

[SOLVED] Sorting an array of JavaScript objects

22 Answered Questions

[SOLVED] Find duplicate records in MySQL

  • 2009-05-12 18:24:21
  • Chris Bartow
  • 686646 View
  • 593 Score
  • 22 Answer
  • Tags:   mysql duplicates

Sponsored Content