Re: [dotNetRDF-bugs] Problem serializing to NQuads

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Rob

Here's a repo with failing code.

https://bitbucket.org/tpluscode/sparql-serialize-test

Tom

On Mon, Jun 16, 2014 at 6:04 PM, Tomasz Pluskiewicz
<tom...@gm...> wrote:
> Will do tomorrow. I've got an example on my work computer.
>
> Tom
>
> On Mon, Jun 16, 2014 at 6:01 PM, Rob Vesse <rv...@do...> wrote:
>> Then please provide a minimal reproducible test case that shows the
>> problem so I can look into it
>>
>> Thanks,
>>
>> Rob
>>
>> On 16/06/2014 16:55, "Tomasz Pluskiewicz" <tom...@gm...>
>> wrote:
>>
>>>On Mon, Jun 16, 2014 at 4:55 PM, Rob Vesse <rv...@do...> wrote:
>>>> Which version is this?  I'm assuming pre 1.0.4 or lower?
>>>>
>>>
>>>No, I got that behaviour in 1.0.5.x as well.
>>>
>>>>
>>>> Since you are talking about Store Manager then this sounds like
>>>>TOOLS-409
>>>> which was already reported and fixed for the 1.0.5 release.  It was a
>>>>bug
>>>> in how Store Manager passed the data to the underlying writers.
>>>>
>>>
>>>This isn't only the Store Manager but also instances of TripleStore.
>>>When LoadFromFile() of LOAD <x> is used it's fine. But after updating
>>>a fresh store with INSERT DATA, wrong dataset will be serialized.
>>>
>>>>
>>>> Note that serializing an in-memory store directly in code was not
>>>>affected
>>>> in any way.
>>>>
>>>> The creation of the empty default graph is an implementation detail, any
>>>> SPARQL Update which inserts data potentially causes the default graph to
>>>> be created because the SPARQL specification states that a dataset always
>>>> contains an unnamed default graph and some parts of the implementation
>>>> assume graphs will already exist so it is safer and faster to pre-create
>>>> any graphs that will potentially be affected.
>>>>
>>>> LOAD is a special case because it is really just a shim to the parser
>>>> sub-system and the way the parser sub-system generates data means that
>>>> only specifically mentioned graphs are ever created.
>>>>
>>>> Rob
>>>>
>>>> On 16/06/2014 15:14, "Tomasz Pluskiewicz" <tom...@gm...>
>>>> wrote:
>>>>
>>>>>Hi
>>>>>
>>>>>We've noticed weird behaviour with in-memory triple store, when
>>>>>serializing to NQuads. Here's what happens:
>>>>>
>>>>>1. Create an empty TripleStore
>>>>>2. Run UPDATE
>>>>>
>>>>>INSERT DATA {
>>>>>GRAPH <http://test.org/user> {
>>>>><http://test.org/user>
>>>>><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>><http://schema.org/Person> .
>>>>><http://test.org/user> <http://some/ontology/favorite>
>>>>><http://test.org/product/name> .
>>>>>}
>>>>>GRAPH <http://test.org/prodList/> {
>>>>><http://test.org/user> <http://xmlns.com/foaf/0.1/primaryTopic>
>>>>><http://test.org/user> .
>>>>>}
>>>>>}
>>>>>
>>>>>3. Serialize to NQuads. Store Manager correctly informs that 3 triple
>>>>>were serialized in 3 graphs (including empty default graph).
>>>>>
>>>>>The output file contains all triples, but without graph names. Thus
>>>>>they are all serialized in the default graph. It's not a problem with
>>>>>the store in-memory. The insert creates correct graph with data.
>>>>>I've confirmed this occurs in all version since 1.0.0.
>>>>>
>>>>>Curiously only when data is loaded with a LOAD <x> INTO GRAPH <y>
>>>>>command or with dNetRDF API, the store is serialized correctly. Only
>>>>>INSERT DATA causes the problem.
>>>>>
>>>>>Is this a known problem?
>>>>>
>>>>>And by the way. Why INSERT DATA creates an empty default graph in the
>>>>>store while loading or LOAD <x> only creates those graphs actually
>>>>>included in the source files?
>>>>>
>>>>>Greets,
>>>>>Tom
>>>>>
>>>>>------------------------------------------------------------------------
>>>>>--
>>>>>----
>>>>>HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>>>Solutions
>>>>>Find What Matters Most in Your Big Data with HPCC Systems
>>>>>Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>>>Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>>>http://p.sf.net/sfu/hpccsystems
>>>>>_______________________________________________
>>>>>dotNetRDF-bugs mailing list
>>>>>dot...@li...
>>>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>-------------------------------------------------------------------------
>>>>-----
>>>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk
>>>>Solutions
>>>> Find What Matters Most in Your Big Data with HPCC Systems
>>>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>> http://p.sf.net/sfu/hpccsystems
>>>> _______________________________________________
>>>> dotNetRDF-bugs mailing list
>>>> dot...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>>
>>>--------------------------------------------------------------------------
>>>----
>>>HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>>>Find What Matters Most in Your Big Data with HPCC Systems
>>>Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>>>Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>>>http://p.sf.net/sfu/hpccsystems
>>>_______________________________________________
>>>dotNetRDF-bugs mailing list
>>>dot...@li...
>>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
>> Find What Matters Most in Your Big Data with HPCC Systems
>> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
>> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
>> http://p.sf.net/sfu/hpccsystems
>> _______________________________________________
>> dotNetRDF-bugs mailing list
>> dot...@li...
>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs