make yoga threadsafe #852
Reference in New Issue
Block a user
No description provided.
Delete Branch "yoga_threadsafe"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Continuing https://github.com/facebook/yoga/pull/791
@nokia6686 is a former member of our team, so we are trying to pick up what he left and carry out the pull request.
Solution
Improved from previous solution with @jpap's suggestions.
gNodeInstanceCount
andgConfigInstanceCount
gDepth
andgCurrentGenerationCount
(renamed to depth and generationCount respectively) between function calls that stem fromYGNodeCalculateLayout
.In
YGNodeCalculateLayout
, passdepth
as value 0, to indicate the root depth.#define
withgPrintChanges
andgPrintSkips
to optimized out unnecessary logging and incrementing depth state.YGConfigGetDefault
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.
If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
The check also failed with master branch with the same error (https://travis-ci.org/facebook/yoga/builds/475317585), I think it should be irrelevant to this pull request.
Thanks for taking the time to refactor the PR, @tcdat96. Some further comments given.
The {
computedFlexBasisGeneration
,generationCount
} members of theYGLayout
type should be updated to 16-bit unsigned.This is probably OK, but it is safer to use
#define XXX 0
here for safer compatibility against various pre-processors, that look for a "C expression of integer type" (gcc) or "must have integral type and can include only integer constants, character constants, and the defined operator" (msvc).Alternatively, you could use
#define XXX FALSE
whereFALSE
is suitably defined after an#if !defined(FALSE)
check in a central header.I suspect the
generationCount
should remain as a member of theYGNodeRef
, and incremented on each invocation ofYGNodeCalculateLayout(...)
so that subsequent calls can reuse computed layout when things don't change (given by an equal generation). Subsequently passing the new generation counter around by function call as you've done is a good idea, to avoid the pointer-to-root situation of the last PR.It would be better to use
int
for the depth type, so as not to limit to a tree height of 65,535. (The generation count, a separate thing, is likely OK to wrap at 16-bits.)To simplify this further you might consider removing the variable declaration, and writing it directly as an argument when invoking
YGLayoutNodeInternal
below. For example:I've refactor other points according to your advice, but I find this comment a little confusing.
As far as I understand,
generationCount
should remain as a member of theYGNodeRef
, then inYGNodeCalculateLayout()
, it should be passed to subsequent calls as a pointer and incremented as needed. But with this approach, we will have to store the root reference in every node, and perform a full subtree traversal withsetChildRoot
, the same as the last PR, which would be quite expensive as you said. How exactly can we handle this issue, @jpap?My apologies; what I meant there was:
YGNodeRef
, with a per-nodeuint16_t
two-byte overhead: let's refer to it asgenerationCount_
below.YGNodeCalculateLayout
, we're passing the root node for the subtree to be laid out, so we can immediately increment (update)generationCount_
of that root node, and then assign the newly incremented value to thegenerationCount
local variable you have defined, instead of always starting from the zero value. (There is a second increment in the same function, where we would do the same as above.)generationCount
as the existing global-variable based implementation. That's nice.generationCount
, which is also nice because each tree is now separate. (Thread-safety also follows.)generationCount
, likely zero. This may result in some nodes being recomputed as the generation now differs. If this is an issue, we could updateYGNodeClone
so as to follow theYGNode.owner_
path up to the root, to copy the rootgenerationCount
when a node is cloned. Either way, such overhead (re-computation or traversal) is limited to cloning. (I did not consider this earlier; your thoughts here are appreciated.)generationCount
local variable inYGNodeCalculateLayout
, we can pass it around on the stack, as the layout proceeds down the tree, thus avoiding the pointer-to-the-root issue.What you need here is simply a non-zero integer value.
Please take a look at the latest commit, @jpap. When cloning nodes, I went with the traversal approach, thought that would be much cheaper than re-computing the differing nodes.
If there's still any issue, please let me know.
Looks like this loop always terminates early: the parent is set to
nullptr
above the loop on line 242.Might be a good idea to add a test? ;-)
Oh my apologies, thought it just a simple loop, so didn't test it thoroughly :-(
Okay, let me add a test 👍
No worries.
Thinking about this a bit more, I'm not sure we need to do this extra work on clone: the generation is incremented each time we perform a
YGNodeCalculateLayout
, so even if we preserved the root generation counter for a cloned subtree, we're still incrementing it on a call toYGNodeCalculateLayout
for that subtree and before we proceed with any work. Any comparison to the old generation count will be false.Is the counter used to help avoid node recalculations during multiple passes of a given layout (generation) calculation? I'd need to spend some time to really understand how the generation counter is used for this; and unfortunately the code isn't terribly straightforward or self-documenting. If you're on the FB team, can you ask the original author for a short-cut?
It might be that all is needed is for the generation counter to be monotonically increase on each call to
YGNodeCalculateLayout
, which means we'll still have to stash it in the root node (if so, the clone stuff should remain). What happens when it wraps to zero?Unfortunately we aren't from the FB team and neither do we have a full understanding how generetionCount works. My team encounter the race condition problem several months ago, so we opened the previous pull request to try to fix that.
I actually thought you are the one on the FB team ;-)
I would suggest us to stick to the original idea of #791 because it focus on solving the data race condition issue without worrying about optimizing the
generationCount
itself. This can be done in a separated PR and leave this PR with minimal changes.That would be unwise because #791:
I am reading through the code to understand how the generation is treated and will report back shortly.
Here's what I've learned from reading the code...
There are two generation counters:
generationCount
andcomputedFlexBasisGeneration
stored inYGLayout
, which are compared to the globalgCurrentGenerationCount
. An overview of each is given below.At the end, I'll provide a few options on how to move forward with this PR. Apologies for the length in advance.
YGLayout::generationCount
YGLayoutNodeInternal
when a node is dirty, and the node has not been visited thus far for a givenYGNodeCalculateLayout
call.YGNodeCalculateLayout
, thegCurrentGenerationCount
is incremented a second time before performing the optional a legacy "stretch behavior" comparison on a cloned tree that has all nodes marked dirty. The newgCurrentGenerationCount
here is to ensure the layout cache is cleared as above.YGNodeCalculateLayout
call.YGLayout::computedFlexBasisGeneration
YGNodeComputeFlexBasisForChild
.YGExperimentalFeatureWebFlexBasis
feature is in use.YGNodeCalculateLayout
call.gCurrentGenerationCount
YGNodeCalculateLayout
before performing a full-tree layout.uint8_t
) for the global counter and associated class member storage. For cloning, what is important is that we use a value different to the "last time".Moving forward
Some options:
YGNodeClone
change.generationCount_
member fromYGNode
and bring back the globalgCurrentGenerationCount
.gCurrentGenerationCount
atomically inYGNodeCalculateLayout
and use it to seed thegenerationCount
stack variable that is passed around by function call.depth
andPRINT_XXX
.uint8_t
type for the generation counters.gCurrentGenerationCount
global.computedFlexBasisGeneration
andgenerationCount
members ofYGLayout
.std::map<YGNode,uint8_t>
stack variable inYGNodeCalculateLayout
that is passed around likedepth
is in the current PR. (Or to minimize function call overhead, thedepth
and a pointer to the map can be placed in a struct, and a pointer to that struct is passed around instead.)uint8_t
value field of the map is a bitfield:layout->generationCount != gCurrentGenerationCount
check inYGLayoutNodeInternal
.computedFlexBasisGeneration != gCurrentGenerationCount
check inYGNodeComputeFlexBasisForChild
.YGNodeCalculateLayout
.What do you think?
The implementation for the first and second options would be quite straightforward, but I'm having a hard time understanding the third optimization. How exactly can we substitute those 3 variables with the map approach? Personally, I find it really risky to make such a big change, and we don't really have the confidence in our understanding of Yoga to carry out the implementation by ourselves neither.
I've submitted PR #855 that takes the third approach: it appears to work well!
So from here, let's see if @davidaurelio or others on the FB team can offer their input. @tcdat96, in the meantime if you have time to update your PR with option 2, it can give others an opportunity to review both PRs and weigh in.
Hopefully we can land this thread-safety fix soon. I'm happy to amend my PR if there are preferences on style, etc.
I updated the PR with the first 2 options, but ran into error
undefined reference to `pthread_getspecific'
when running gtest. Have you ever encountered a similar problem, @jpap?Sadly I've never experienced that; sounds like a linker configuration problem? Perhaps you've inadvertently modified some of the build configuration someplace. I would try:
I've already tried both but unfortunately nothing works. The error appeared even with the master commit so I think it should be my configuration problem, but sadly I don't really have any experience in this type of problem :-(
It seems that the error only emerges in our ubuntu machines, the same source code runs without any issue in MacOS.
I modified the BUCK file in lib/gtest with the workaround described in https://github.com/facebook/buck/issues/1443 and it worked. Now this PR runs successfully with all tests passed.
If you have the time, please take a look at both PRs, @davidaurelio :-)
Thank you for your contribution. I see good things in here, e.g. passing generation count and depth along as function parameters.
However, I don’t think we really need atomics here. Not using them wouldn’t affect correctness, and avoid the potential bus overhead.
is awesome
please use
constexpr
– check my inline comments. And please put it in a separate pull request. I will happily merge it.see inline comment – I don’t agree with the notion of a leak, and would like to retain the behaviour of having a single default config. If you feel strongly about running a destructor when terminating the process, it could be wrapped into a static
unique_ptr
. But I don’t think we have to do this at all.@@ -24,3 +24,3 @@
uint32_t computedFlexBasisGeneration = 0;
uint8_t computedFlexBasisGeneration = 0;
YGFloatOptional computedFlexBasis = {};
What was the motivation for this change? Due to alignment requirements, this won’t even save memory without reordering data members.
This would get exposed to every importer of
Yoga.h
. Let’s not do this here, please.While this serialises the access to these variables, it causes additional synchronisation needs between cores when creating nodes on different threads in parallel.
At this point, I’d rather let the count go out of sync, instead of adding any bus overhead. Its accuracy isn’t a correctness problem, is it?
Is this to avoid the “leak”? This is a statically constructed instance that will live until the process exits, which seems perfectly fine. Could you explain which problem you are trying to solve?
Again, this causes additional synchronisation, and I am not sure it’s necessary:
this will usually be loaded into a register, incremented, and written back to memory.
When two threads do this in parallel, they might both load the old value, increment by one, and write back – with the global value being incremented by one, not two, as a result.
That will still increment the count, and cause layouts to be run – there isn’t really a need to synchronise it.
Am I missing something?
Minor nitpick, please use uniform initialization (
std::atomic<uint8_t> gCurrentGenerationCount{0};
)constexpr
, please. E.g.constexpr bool kPrintChanges = false;
regular
if (kPrintChanges)
move to
Yoga.cpp
a) it’s not used here
b) this header needs to be valid C
Honestly, I don't have the full knowledge about how these variables could affect the correctness of Yoga. These variables were flagged by ThreadSanitizer to be a possible data race problem, so I only tried to fix it at the language level.
But if its accuracy is not important to the program's flow, then what's the point of having these variables?
These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the
BUCK
file, underTEST_COMPILER_FLAGS
.These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the
BUCK
file, underTEST_COMPILER_FLAGS
.@@ -24,3 +24,3 @@
uint32_t computedFlexBasisGeneration = 0;
uint8_t computedFlexBasisGeneration = 0;
YGFloatOptional computedFlexBasis = {};
The motivation is to save heap: there are many bits left over from alignment of the bitfields above it, so you could combine
computedFlexBasisGeneration
with it. I suspect you can get away with just one bit, so long as the generation counter is the same width (or is truncated with a logical AND mask when the generation is incremented).I'll switch to constexpr in another pull request, so we won't need this anymore
This is actually my misunderstanding about how static variables work. This was marked by CppUTest as a memory leak but it appears to be only a false alarm. I'll revert this.
@davidaurelio @jpap Sorry for delaying this for too long, I was a little too busy last month.
In the new update, I've:
constexpr
in another pull requestSo this pull request will only have 2 changes now:
generationCount
depth
andgenerationCount
via function parameters, instead of using global variables.any update, @davidaurelio ?
@davidaurelio @jpap hello?
Hey all - this is still flagging our thread-sanitizer runs, which makes it impossible to find real issues since it happens so frequency. It looks bad for Yoga's usage at a certain big company :).
Could the team find some way to fix or silence the warning with annotations? cc @davidaurelio
Our team had also previously opened this PR (sat for ~2 months): https://github.com/facebook/yoga/pull/770
@davidaurelio it's been several months. Could you take a look at this?
Hi folks, sorry for being unresponsive.
The aspect of this creating too much noise in sanitizers is new to me, sorry about that. I thought this was mainly due to concerns about the data races themselves.
Let me see how well this merges …
@davidaurelio has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Pull request closed